Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](Nereids) Pull up join from union all #28682

Merged
merged 2 commits into from
Dec 22, 2023

Conversation

xzj7019
Copy link
Contributor

@xzj7019 xzj7019 commented Dec 19, 2023

Proposed changes

Pull up join from union all rule.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 19, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 2ce8853ce95ff748ab15d8f3762e7faf44a4deb0, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4675	4375	4408	4375
q2	359	155	159	155
q3	1471	1254	1248	1248
q4	1118	947	868	868
q5	3123	3156	3161	3156
q6	245	127	129	127
q7	975	492	498	492
q8	2172	2202	2185	2185
q9	6680	6649	6650	6649
q10	3220	3282	3271	3271
q11	309	179	177	177
q12	355	216	209	209
q13	4559	3851	3798	3798
q14	245	212	217	212
q15	572	522	523	522
q16	434	390	389	389
q17	994	579	576	576
q18	7121	6824	6935	6824
q19	1520	1393	1388	1388
q20	502	300	280	280
q21	3127	2663	2587	2587
q22	343	281	276	276
Total cold run time: 44119 ms
Total hot run time: 39764 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4313	4346	4315	4315
q2	267	169	174	169
q3	3533	3512	3506	3506
q4	2379	2360	2370	2360
q5	5736	5730	5727	5727
q6	243	123	123	123
q7	2380	1873	1904	1873
q8	3526	3504	3523	3504
q9	9037	8984	9005	8984
q10	3901	4007	3991	3991
q11	485	366	364	364
q12	761	617	598	598
q13	4331	3544	3557	3544
q14	278	261	263	261
q15	573	509	526	509
q16	509	430	439	430
q17	1887	1864	1831	1831
q18	8675	8177	8216	8177
q19	1715	1768	1739	1739
q20	2249	1942	1936	1936
q21	6534	6161	6171	6161
q22	486	409	431	409
Total cold run time: 63798 ms
Total hot run time: 60511 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.83 seconds
stream load tsv: 569 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17184002416 Bytes

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 20, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 27d93ee82e9ef0677ad647da6bdb2f22e7466940, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4732	4467	4478	4467
q2	363	152	199	152
q3	1480	1246	1225	1225
q4	1122	898	913	898
q5	3179	3151	3168	3151
q6	248	128	131	128
q7	1026	500	491	491
q8	2204	2211	2180	2180
q9	6703	6670	6672	6670
q10	3259	3280	3313	3280
q11	312	190	195	190
q12	359	216	213	213
q13	4563	3862	3860	3860
q14	250	210	219	210
q15	576	524	523	523
q16	448	399	400	399
q17	1006	577	572	572
q18	7290	7001	6892	6892
q19	1513	1378	1446	1378
q20	557	310	322	310
q21	3107	2683	2666	2666
q22	362	291	298	291
Total cold run time: 44659 ms
Total hot run time: 40146 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4394	4357	4363	4357
q2	275	179	175	175
q3	3577	3526	3517	3517
q4	2396	2396	2396	2396
q5	5742	5738	5758	5738
q6	241	119	122	119
q7	2399	1887	1900	1887
q8	3517	3527	3520	3520
q9	8981	9058	9054	9054
q10	3911	4003	4006	4003
q11	497	364	381	364
q12	775	610	624	610
q13	4315	3568	3561	3561
q14	302	259	259	259
q15	572	523	516	516
q16	512	468	468	468
q17	1896	1870	1860	1860
q18	8645	8158	8332	8158
q19	1729	1770	1736	1736
q20	2284	1962	1936	1936
q21	6562	6245	6181	6181
q22	524	433	444	433
Total cold run time: 64046 ms
Total hot run time: 60848 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.42 seconds
stream load tsv: 571 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17184328179 Bytes

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 20, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.95 seconds
stream load tsv: 564 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17188462824 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 1825a99c7d512c8d324b0d8fbfbb3ec74a4e1737, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4676	4391	4439	4391
q2	358	167	158	158
q3	1467	1270	1238	1238
q4	1115	892	883	883
q5	3135	3152	3147	3147
q6	250	130	132	130
q7	988	491	476	476
q8	2213	2203	2174	2174
q9	6677	6694	6670	6670
q10	3211	3256	3277	3256
q11	306	187	185	185
q12	360	205	200	200
q13	4563	3823	3828	3823
q14	240	215	215	215
q15	551	529	519	519
q16	439	386	384	384
q17	1012	561	564	561
q18	7203	6843	7005	6843
q19	1521	1379	1383	1379
q20	495	304	304	304
q21	3082	2618	2710	2618
q22	349	278	283	278
Total cold run time: 44211 ms
Total hot run time: 39832 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4333	4320	4312	4312
q2	268	165	172	165
q3	3553	3546	3526	3526
q4	2389	2372	2382	2372
q5	5733	5746	5745	5745
q6	240	120	123	120
q7	2390	1878	1883	1878
q8	3524	3520	3532	3520
q9	8977	9041	9022	9022
q10	3915	4003	3987	3987
q11	490	369	361	361
q12	789	587	608	587
q13	4284	3585	3576	3576
q14	283	275	259	259
q15	570	517	522	517
q16	507	456	458	456
q17	1895	1886	1849	1849
q18	8696	8295	8140	8140
q19	1742	1766	1766	1766
q20	2261	1947	1934	1934
q21	6514	6206	6184	6184
q22	510	426	425	425
Total cold run time: 63863 ms
Total hot run time: 60701 ms

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 20, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.62 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17183453855 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit cb5aa1e76f1bdb8c1708b487b2b9e0e12da730f3, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4703	4420	4428	4420
q2	364	150	158	150
q3	1460	1235	1239	1235
q4	1116	938	926	926
q5	3194	3149	3201	3149
q6	249	130	126	126
q7	1008	493	487	487
q8	2187	2211	2161	2161
q9	6688	6677	6665	6665
q10	3255	3289	3254	3254
q11	302	189	184	184
q12	348	212	212	212
q13	4529	3806	3805	3805
q14	236	215	213	213
q15	570	525	530	525
q16	446	397	387	387
q17	998	609	539	539
q18	7150	7052	7031	7031
q19	1513	1365	1363	1363
q20	510	311	326	311
q21	3102	2634	2692	2634
q22	341	282	277	277
Total cold run time: 44269 ms
Total hot run time: 40054 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4339	4363	4328	4328
q2	277	169	174	169
q3	3557	3537	3549	3537
q4	2408	2391	2396	2391
q5	5778	5750	5773	5750
q6	242	123	124	123
q7	2382	1891	1856	1856
q8	3543	3517	3531	3517
q9	9057	9016	8980	8980
q10	3902	3992	3994	3992
q11	491	380	378	378
q12	778	609	605	605
q13	4287	3595	3577	3577
q14	291	254	261	254
q15	566	516	520	516
q16	519	482	461	461
q17	1883	1861	1845	1845
q18	8741	8164	8093	8093
q19	1731	1753	1716	1716
q20	2273	1964	1951	1951
q21	6554	6206	6156	6156
q22	504	422	450	422
Total cold run time: 64103 ms
Total hot run time: 60617 ms

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 20, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 93fb58ac85f7f8644f3285e4e95bd3abad271899, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4705	4452	4460	4452
q2	368	146	167	146
q3	1489	1235	1218	1218
q4	1124	910	892	892
q5	3157	3162	3193	3162
q6	246	131	131	131
q7	983	501	490	490
q8	2218	2211	2209	2209
q9	6707	6666	6697	6666
q10	3257	3301	3293	3293
q11	309	185	186	185
q12	362	218	207	207
q13	4558	3809	3812	3809
q14	245	211	220	211
q15	570	533	524	524
q16	450	388	393	388
q17	1008	600	570	570
q18	7101	6849	6942	6849
q19	1526	1323	1455	1323
q20	547	303	298	298
q21	3064	2648	2677	2648
q22	360	292	300	292
Total cold run time: 44354 ms
Total hot run time: 39963 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4390	4373	4357	4357
q2	273	166	173	166
q3	3546	3529	3543	3529
q4	2393	2382	2382	2382
q5	5746	5751	5754	5751
q6	242	123	125	123
q7	2384	1926	1901	1901
q8	3517	3539	3510	3510
q9	9019	9012	8988	8988
q10	3929	4005	4009	4005
q11	493	388	386	386
q12	764	624	597	597
q13	4296	3559	3560	3559
q14	297	257	256	256
q15	569	532	527	527
q16	501	445	473	445
q17	1912	1864	1857	1857
q18	8763	8206	8342	8206
q19	1720	1736	1775	1736
q20	2267	1958	1949	1949
q21	6632	6278	6196	6196
q22	509	432	435	432
Total cold run time: 64162 ms
Total hot run time: 60858 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.25 seconds
stream load tsv: 574 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17189067036 Bytes

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 21, 2023

run buildall

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 21, 2023

run buildall

2 similar comments
@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 21, 2023

run buildall

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 21, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.71 seconds
stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17183804086 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit a2473e07646f77a79d6531547c3dd0847da96ece, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4699	4396	4430	4396
q2	368	187	158	158
q3	1495	1260	1210	1210
q4	1109	924	958	924
q5	3170	3160	3163	3160
q6	247	124	124	124
q7	1002	488	481	481
q8	2207	2203	2186	2186
q9	6686	6683	6667	6667
q10	3219	3270	3291	3270
q11	304	195	197	195
q12	348	207	211	207
q13	4569	3826	3825	3825
q14	244	212	214	212
q15	565	523	517	517
q16	437	380	384	380
q17	1016	592	561	561
q18	7167	6969	6892	6892
q19	1510	1417	1410	1410
q20	509	299	313	299
q21	3085	2625	2674	2625
q22	344	280	287	280
Total cold run time: 44300 ms
Total hot run time: 39979 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4373	4317	4320	4317
q2	266	163	173	163
q3	3512	3522	3507	3507
q4	2401	2382	2385	2382
q5	5716	5724	5739	5724
q6	241	122	122	122
q7	2415	1893	1888	1888
q8	3528	3514	3515	3514
q9	8983	8989	8974	8974
q10	3916	4000	4004	4000
q11	489	354	365	354
q12	759	603	612	603
q13	4284	3542	3538	3538
q14	299	251	255	251
q15	573	528	524	524
q16	504	482	499	482
q17	1863	1883	1889	1883
q18	8694	8220	8183	8183
q19	1736	1772	1735	1735
q20	2235	1949	1946	1946
q21	6523	6182	6181	6181
q22	487	424	407	407
Total cold run time: 63797 ms
Total hot run time: 60678 ms

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 21, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.13 seconds
stream load tsv: 569 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17183744476 Bytes

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 21, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 46a6b1d35bf482f8d8cb6e710b3371dd590ac2cc, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4687	4454	4409	4409
q2	395	160	160	160
q3	1464	1260	1256	1256
q4	1113	909	912	909
q5	3186	3162	3161	3161
q6	249	131	128	128
q7	1019	505	489	489
q8	2175	2210	2168	2168
q9	6716	6682	6655	6655
q10	3225	3269	3269	3269
q11	310	188	189	188
q12	351	211	214	211
q13	4562	3799	3794	3794
q14	239	217	212	212
q15	572	525	518	518
q16	449	397	383	383
q17	1003	580	553	553
q18	7140	6950	6983	6950
q19	1504	1378	1406	1378
q20	511	337	292	292
q21	3056	2652	2663	2652
q22	348	274	282	274
Total cold run time: 44274 ms
Total hot run time: 40009 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4375	4349	4350	4349
q2	269	166	171	166
q3	3528	3535	3516	3516
q4	2408	2380	2372	2372
q5	5747	5734	5744	5734
q6	243	125	126	125
q7	2377	1869	1905	1869
q8	3521	3539	3514	3514
q9	9049	9007	9026	9007
q10	3929	4022	4006	4006
q11	484	377	366	366
q12	776	606	612	606
q13	4319	3585	3540	3540
q14	289	263	253	253
q15	570	529	515	515
q16	511	450	450	450
q17	1844	1861	1831	1831
q18	8619	8272	8231	8231
q19	1740	1776	1770	1770
q20	2238	1954	1930	1930
q21	6512	6148	6169	6148
q22	508	414	421	414
Total cold run time: 63856 ms
Total hot run time: 60712 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 21, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit c10723c0bb6f0cfee2085431bd1cf7c640eebf52, data reload: true

run tpch-sf100 query with default conf and session variables
q1	4537	4328	4355	4328
q2	380	122	113	113
q3	1460	1191	1208	1191
q4	1136	944	831	831
q5	3174	3174	3201	3174
q6	243	123	123	123
q7	980	485	472	472
q8	2167	2184	2182	2182
q9	6766	6710	6676	6676
q10	3254	3251	3279	3251
q11	302	175	185	175
q12	357	195	200	195
q13	4532	3833	3827	3827
q14	228	209	206	206
q15	559	507	507	507
q16	449	386	379	379
q17	1003	600	543	543
q18	7438	6979	7746	6979
q19	1557	1307	1426	1307
q20	789	310	308	308
q21	3104	2710	2672	2672
q22	338	273	273	273
Total cold run time: 44753 ms
Total hot run time: 39712 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4333	4339	4331	4331
q2	270	165	167	165
q3	3533	3542	3532	3532
q4	2387	2376	2378	2376
q5	5742	5736	5766	5736
q6	238	119	118	118
q7	2432	1854	1880	1854
q8	3547	3536	3514	3514
q9	9091	9033	9089	9033
q10	3894	4015	4007	4007
q11	484	375	371	371
q12	775	607	598	598
q13	4277	3519	3589	3519
q14	283	247	257	247
q15	567	516	518	516
q16	508	452	449	449
q17	1889	1862	1853	1853
q18	8773	8388	8369	8369
q19	1757	1760	1760	1760
q20	2267	1937	1945	1937
q21	6563	6206	6170	6170
q22	505	424	404	404
Total cold run time: 64115 ms
Total hot run time: 60859 ms

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 22, 2023
@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 22, 2023

run buildall

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 22, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.42 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.1 seconds inserted 10000000 Rows, about 355K ops/s
storage size: 17188017219 Bytes

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 22, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.88 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17183891309 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit f6ce5d21fe39c08082f0ae66bdd806d4cac52a74, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4690	4414	4429	4414
q2	362	154	158	154
q3	1446	1276	1232	1232
q4	1127	933	929	929
q5	3147	3162	3185	3162
q6	247	129	128	128
q7	1032	487	493	487
q8	2175	2191	2173	2173
q9	6693	6690	6638	6638
q10	3214	3264	3269	3264
q11	308	199	192	192
q12	352	205	206	205
q13	4565	3845	3810	3810
q14	243	209	221	209
q15	575	541	514	514
q16	452	385	381	381
q17	1005	543	512	512
q18	7035	6768	6876	6768
q19	1528	1395	1447	1395
q20	561	333	303	303
q21	3076	2627	2666	2627
q22	351	275	281	275
Total cold run time: 44184 ms
Total hot run time: 39772 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4373	4367	4343	4343
q2	270	167	167	167
q3	3494	3492	3487	3487
q4	2405	2375	2370	2370
q5	5702	5706	5710	5706
q6	242	124	127	124
q7	2387	1851	1836	1836
q8	3514	3525	3516	3516
q9	9101	9043	8975	8975
q10	3939	4017	3993	3993
q11	487	385	370	370
q12	765	596	594	594
q13	4301	3568	3546	3546
q14	296	252	261	252
q15	584	522	521	521
q16	500	454	462	454
q17	1880	1851	1852	1851
q18	8477	8124	8213	8124
q19	1736	1752	1732	1732
q20	2257	1952	1940	1940
q21	6513	6155	6132	6132
q22	508	421	406	406
Total cold run time: 63731 ms
Total hot run time: 60439 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 22, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@starocean999 starocean999 merged commit b2b209e into apache:master Dec 22, 2023
27 of 28 checks passed
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Dec 27, 2023
* [Performance](point query)Opimize partition prune for point query (apache#28150)

* [Performance](point query)Opimize partition prune for point query

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (apache#27898) (apache#28598)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch (apache#28597)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* clang format

* [feature](Nereids): eliminate semi join (apache#28588)

Eliminate Semi/Anti Join which is FALSE or TRUE.

* [feature](Nereids) support datev1 and datetimev1 (apache#28581)

* [improvement](http) add show_table_data http api (apache#28380)

In some cases, users need to get the data size of single replica of a table, and evaluate certain actions based on this, such as estimating the precise backup size.

Signed-off-by: nextdreamblue <[email protected]>

* [Enhance](regression)enhance jdbc case to adapt to use case concurrency (apache#28565)

enhance jdbc case to adapt to use case concurrency

* [opt](Nereids)when both Nereids and old parsers report errors, prompt error messages for the Nereids (apache#28580)

* [Improvement](regression) change compound predicate regression case name to make it more clear (apache#28612)

* [regression](memtable) add case for memtable flush error handle (apache#28285)

Co-authored-by: ziyang zhang <[email protected]>

* [regression-test](memtable)  test memtable flush is high priority for vtable writerV1 (apache#28502)

* enhance performance for broken tablet checking under multi-core scenario with a coarse-grained read lock (apache#28552)

* [fix](load) fix memtracking orphan too large (apache#28600)

* [fix](memtable-limiter) do not block write if load mem usage is low (apache#28602)


Co-authored-by: Yongqiang YANG <[email protected]>

* [refactor](renamefile) rename some files according to the class names (apache#28606)

* [feature](mtmv)create mtmv support refresh_partition_num (apache#28566)

- create/alter mtmv support refresh_partition_num
- mtmv task according to refresh_partition_num executes refresh tasks in batches
- `tasks` tvf add column `CompletedPartitions` and `progress`
- fix mtmv can not `show temp partition` and `drop temp partition`
- fix task can not get error msg when insert overwrite error
- fix when the partition field is capitalized, the verification of creating a mtmv does not pass

* [fix](Nereids) stats estimation of lessThan apache#28444

* [fix](regression)Change analyze_timeout to global.  (apache#28587)

Fix hive statistics regression case. analyze_timeout is a global session variable.

* [chore] Add bvar for meta operations of BE (apache#28374)

* [Bug](cooldown) Fix problem that followers may never completely cooldown (apache#28561)

* [Improve](tvf)jni-avro support split file (apache#27933)

* [feature](mtmv)after creating a materialized view, if other operations fail, roll back  (apache#28621)

after create mtmv, if create job failed,need drop mtmv

* [fix](meta) update killed query state (#) (apache#25917)

* [refactor](profile&names) using dst_id in pipelinex profile to be same as non pipeline; rename some function names (apache#28626)

Co-authored-by: yiguolei <[email protected]>

* [feature][executor]support workload schedule policy (apache#28443)

* [feature](expr) Support kill query by query_id (apache#28530)

Issue Number: open apache#28517

* [chore](user) Add user property `parallel_fragment_exec_instance_num` (apache#28447)

* [feature](inverted index) add ignore_above property to prevent long s… (apache#28585)

When string is too long, clucene will throw an error. 
And the string is too long to analyze. So we ignore the string in index process when the string is longer than 256 bytes by default.
We add an poperty `ignore_above` for user to customize.

* [Enhance](broker) add inputstream expire scheduled checker to avoid memory leak for broker scan (apache#28589)

This pr introduces 2 broker conf:

1. enable_input_stream_expire_check: which indicates whether enable inputStream expire check.
2. input_stream_expire_seconds: which indicates the timeout seconds for inputStream since last update.

* [fix](hive) add support for `quoteChar` and `seperatorChar` for hive (apache#28613)

add support for quoteChar and seperatorChar .

* [Fix](transactional-hive) Fix hive transactional table return empty result. (apache#28518)

* [Fix](memtable) fix `shrink_memtable_by_agg` without duplicated keys (apache#28660)

remove duplicated logic:
```
vectorized::Block in_block = _input_mutable_block.to_block();
_put_into_output(in_block);
```
`_input_mutable_block.to_block()` will move `_input_mutable_block`, and lead to `flush` with empty block

* [enhance](partition_id) check partition id before store meta (apache#28055)

* [opt](task-assignment) use consistent hash as default task assigner and cache the consistent hash ring (apache#28522)

1. Use consistent hash algo as the default assigner for file query scan node
    A consistent assignment can better utilize the page cache of BE node.

2. Cache the consistent hash ring
    Init a consistent hash ring is time-consuming because there a thousands of virtual node need to be added.
    So cache it for better performance

* [Fix](Job)Fixed job scheduling missing certain time window schedules (apache#28659)

Since scheduling itself consumes a certain amount of time, the start time of the time window should not be the current time, but the end time of the last schedule.

* [fix](test) fix ccr test cases (apache#28664)

* [fix](regression) fix test_set_replica_status due to force_olap_table_replication_num=3 (apache#28573)

* [fix](regression) restore reserve num replicas (apache#28541)

* [fix](regression) fix test_alter_colocate_table due to force_olap_table_replication_num=3 (apache#28575)

* [improvement](transaction) reduce publish txn log (apache#28277)

* [fix](stream_load)fix bug for stream (apache#27752)

1. forbid thed  stream_load without content-length or chunked Transfer Encoding
2. forbid thed  stream_load both with content-length and chunked Transfer Encoding

Co-authored-by: xingying01 <[email protected]>

* [Revert](partial update) Revert "Fix missing rowsets during doing alignment when flushing memtable due to compaction (apache#28062)" (apache#28674)

This reverts commit 485d7db.

* [feat](Nereids) support outer join and aggregate bitmap rewrite by mv (apache#28596)

- Support left outer join rewrite by materialized view
- Support bitmap_union roll up to imp count(distinct)
- Support partition materialized view rewrite

* [refactor](runtimefilter) do not use QueryContext in runtime filter (apache#28559)

* [fix](planner)should bind expr using no slot to correct tuple (apache#28656)

* [fix](planner) ctas update datev1 to datev2 should use equals (apache#28641)

* [fix](Nereids) should only do bind relation in view analyzer (apache#28637)

* [code](pipelineX) refine some pipelineX code  (apache#28570)

* [fix](partial update) report error directly if missing rowsets during doing alignment when flushing memtable due to compaction  (apache#28677)

* [opt](inverted index) Add null document interface to optimize empty string indexing (apache#28661)

* [feature](invert index) match_regexp feature added (apache#28257)

* [Feature](datatype) update be ut codes and fix bugs for IPv4/v6 (apache#28670)

* [test](Nereids): remove shape plan project and distribute in eager test (apache#28701)

* [fix](statistics)Fix drop stats fail silently bug. (apache#28635)

Drop stats use IN predicate to filter the column stats to delete. The default length of IN predicate is 1024, drop table stats with more than 1024 columns may fail.
This pr is to split the delete sql based on the IN predicate length.

* [fix](mtmv) fix insert overwrite getExecTimeout error (apache#28700)

should use InsertTimeout but QueryTimeout

* [Enhancement](auto-partition) change the behaviour when insert overwrite an auto partition table apache#28683

If we specific target partition(s) when inserting overwrite an auto partition table,
before:
could create new partition
now:
behalf just like non-auto partition table

* [FIX](regresstest) fix case with double value apache#28668

double value in case has 27 figures after the point, which will make not standable output

* [test](Nereids): add test for scalar agg (apache#28712)

* [fix](meta_scanner) fix meta_scanner process ColumnNullable (apache#28711)

* [fix](nereids) Fix data wrong using mv rewrite and ignore case when getting mv related partition table (apache#28699)

1. Fix data wrong using mv rewrite
2. Ignore case when getting mv related partition table
3. Enable infer expression column name without alias when create mv

* [test](regression-test) order by decs should only make effect on its nearest column apache#28728

* [refine](pipelineX)Make the 'set ready' logic of SenderQueue in pipelineX the same as that in the pipeline (apache#28488)

* [fix](Nereids) delete partition failed (apache#28717)

1. parser's partitionSpec changed unexpectly by PR apache#26492
2. delete without using should support un-equals expression

* [fix](tvf)Fixed the avro-scanner projection pushdown failing to query on multiple BEs (apache#28709)

* [fix](block) fix nullptr in MutableBlock::allocated_bytes (apache#28738)

* [fix](mtmv)fix thread local reference to checkpoint's Env, causing Env to be unable to be reclaimed, resulting in excessive memory usage by FE (apache#28723)

when replay addTaskResult log,will create one ConnectContext,and set Env.getCurrentEnv,then store this ctx in ConnectContext.threadLocalInfo,threadLocalInfo is static,so this ctx can not be recycling,Env of replay thread also can not be recycling

* [Fix](json reader) fix json reader crash due to `fmt::format_to` (apache#28737)

```
4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
5# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
6# 0x00005622F33D22B1 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
7# 0x00005622F33D2404 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
8# fmt::v7::detail::error_handler::on_error(char const*) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
9# char const* fmt::v7::detail::parse_replacement_field<char, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&>(char const*, char const*, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
10# void fmt::v7::detail::vformat_to<char>(fmt::v7::detail::buffer<char>&, fmt::v7::basic_string_view<char>, fmt::v7::basic_format_args<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<fmt::v7::type_identity<char>::type>, fmt::v7::type_identity<char>::type> >, fmt::v7::detail::locale_ref) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
11# doris::vectorized::NewJsonReader::_append_error_msg(rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool*) at /root/doris/be/src/vec/exec/format/json/new_json_reader.cpp:924
12# doris::vectorized::NewJsonReader::_set_column_value
```

* [bug](coredump) Fix coredump in aggregation node's destruction(apache#28684)

fix coredump in aggregation node's destruction

* [pipelineX](improvement) Support global runtime filter (apache#28692)

* [FIX](explode)fix explode array decimal (apache#28744)

* fix explode with array<decimal> has specific precision at old planner

* [refactor](broadcastbuffer) using a queue to remove ref and unref codes (apache#28698)

Co-authored-by: yiguolei <[email protected]>Add a new class broadcastbufferholderqueue to manage holders
Using shared ptr to manage holders, not use ref and unref, it is too difficult to maintain.

* [enhance](blacklist) seperate blacklist conf from heartbeat (apache#28638)

There is a circuit breaker lasting for 2 minutes in grpc, then if a be is down and up again, send fragments to the be fails lasting for 2 minutes.

* [fix](stmt):fix CreateTableStmt toSql placed comment in wrong place (apache#27504)

Issue Number: close apache#27474
Co-authored-by: tongyang.han <[email protected]>

* [Improve](compile) add `__AVX2__` macro for JsonbParser (apache#28754)

* [Improve](compile) add `__AVX2__` macro for JsonbParser

* throw exception instead of CHECK

* [Feature](materialized-view) support match logicalAggregate(logicalProject(logicalFilter(logicalOlapScan())) without agg (apache#28747)

support match logicalAggregate(logicalProject(logicalFilter(logicalOlapScan())) without agg

* [Fix] (schema change) forbid adding time type column (apache#28751)

* [fix](group_commit) fix group commit cancel stuck (apache#28749)

* [fix](scanner) fix concurrency bugs when scanner is stopped or finished (apache#28650)

`ScannerContext` will schedule scanners even after stopped, and confused with `_is_finished` and `_should_stop`.
 Only Fix the concurrency bugs when scanner is stopped or finished reported in apache#28384

* [fix](regression) fix stream load properties case fail (apache#28680)

* [fix](fe ut) fix PropertyConverterTest (apache#28722)

* [improve](rpc) Log channel state before shutdown backend service client (apache#28667)

* [fix](publish version) fix publish fail but return ok (apache#28425)

* [test](Nereids): remove shape plan project and distribute in eager test (apache#28724)

* [refactor](nereids) make NormalizeAggregate rule more clear and readable (apache#28607)

* [FIX](type) fix matchExactType for complex type (apache#28233)

fe matchExactType function should call type.matchTypes for its own logic, do not switch case to do special logic otherwise we may meet core in be like this.
 ```
F20231208 18:54:39.359673 680131 block.h:535] Check failed: _data_types[i]->is_nullable()  target type: Struct(l_info:Nullable(Array(Nullable(String)))) src type: Struct(col:Nullable(Array(Nullable(UInt8))))
*** Check failure stack trace: ***
    @     0x5584e952b926  google::LogMessage::SendToLog()
    @     0x5584e9527ef0  google::LogMessage::Flush()
    @     0x5584e952c169  google::LogMessageFatal::~LogMessageFatal()
    @     0x5584cf17201e  doris::vectorized::MutableBlock::merge_impl<>()
    @     0x5584ceac4b1d  doris::vectorized::MutableBlock::merge<>()
    @     0x5584d4dd7de3  doris::vectorized::VUnionNode::get_next_const()
    @     0x5584d4dd9a45  doris::vectorized::VUnionNode::get_next()
    @     0x5584bce469bd  std::__invoke_impl<>()
    @     0x5584bce466d0  std::__invoke<>()
    @     0x5584bce465c7  _ZNSt5_BindIFMN5doris8ExecNodeEFNS0_6StatusEPNS0_12RuntimeStateEPNS0_10vectorized5BlockEPbEPS1_St12_PlaceholderILi1EESC_ILi2EESC_ILi3EEEE6__callIS2_JOS4_OS7_OS8_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @     0x5584bce46358  std::_Bind<>::operator()<>()
    @     0x5584bce46208  std::__invoke_impl<>()
    @     0x5584bce46178  _ZSt10__invoke_rIN5doris6StatusERSt5_BindIFMNS0_8ExecNodeEFS1_PNS0_12RuntimeStateEPNS0_10vectorized5BlockEPbEPS3_St12_PlaceholderILi1EESD_ILi2EESD_ILi3EEEEJS5_S8_S9_EENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESL_E4typeEOSM_DpOSN_
    @     0x5584bce45c18  std::_Function_handler<>::_M_invoke()
    @     0x5584bce6412f  std::function<>::operator()()
    @     0x5584bce56382  doris::ExecNode::get_next_after_projects()
    @     0x5584bce26218  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x5584bce2431b  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x5584bce22a96  doris::PlanFragmentExecutor::open()
    @     0x5584bce27c9d  doris::PlanFragmentExecutor::execute()
    @     0x5584bcbdb3f8  doris::FragmentMgr::_exec_actual()
    @     0x5584bcbf982f  doris::FragmentMgr::exec_plan_fragment()::$_0::operator()()
    @     0x5584bcbf9715  std::__invoke_impl<>()
    @     0x5584bcbf96b5  _ZSt10__invoke_rIvRZN5doris11FragmentMgr18exec_plan_fragmentERKNS0_23TExecPlanFragmentParamsERKSt8functionIFvPNS0_12RuntimeStateEPNS0_6StatusEEEE3$_0JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESH_E4typeEOSI_DpOSJ_
    @     0x5584bcbf942d  std::_Function_handler<>::_M_invoke()
    @     0x5584b9dfd883  std::function<>::operator()()
    @     0x5584bd6e3929  doris::FunctionRunnable::run()
    @     0x5584bd6cf8ce  doris::ThreadPool::dispatch_thread()
```

* [test](Nereids): add assert debug log for TopnToMaxTest (apache#28755)

* [improve](move-memtable) limit task num in load stream flush token (apache#28748)

* [feature](Nereids) elimite inner join by foreign key (apache#28486)

* [enhancement](backup/restore) support alter s3 repo info about ak/sk/token (apache#27027)

in some cases:

s3.session_token/AWS_TOKEN will be expired after hours, and may be upload snapshot job will failed if data is big large;
as same reason, repo will be expired too when RepositoryMgr exec repo ping
so it need support alter s3 repo properties about ak/sk/token and update upload snapshot job properties to continue backup.

Signed-off-by: nextdreamblue <[email protected]>

* [improve](transaction) extend abort transaction time (apache#28662)

* [regression](p2) fix test cases result (apache#28768)

regression-test/data/external_table_p2/hive/test_hive_hudi.out
regression-test/data/external_table_p2/hive/test_hive_to_array.out
regression-test/suites/external_table_p2/tvf/test_local_tvf_compression.groovy
regression-test/suites/external_table_p2/tvf/test_path_partition_keys.groovy
regression-test/data/external_table_p2/hive/test_hive_text_complex_type.out

* (fix)[meta][export] fix replay export NPE issue (apache#28752)

The ConnectionContext does not exist in replay thread

* [Bug fix][metrics] correct fe collector type for jvm_gc (apache#28784)

Co-authored-by: 胥剑旭 <[email protected]>

* [Chore](decimal) set check_overflow_for_decimal to true when alter table(apache#28777)

set check_overflow_for_decimal to true when alter table

* [pipelineX](fix) Fix TPCH Q2 (apache#28783)

* (topN)runtime_predicate is only triggered when the column name is obtained (apache#28419)

Issue Number: close apache#27485

* [fix](planner)fix bug of bound conjunct to wrong tuple (apache#28811)

this fix bug introduced by apache#28656

* [bug](pipelineX) Fix pipelineX bug on multiple BE (apache#28792)

* [opt](inverted index) ignore_above only affects untokenized strings (apache#28819)

* [improve](load) reduce lock scope in MemTableWriter active consumption (apache#28790)

* [opt](Nereids) improve Shape check cases (apache#28124)

* tmplate update
* log tpcds stats when check shape

* [fix](group commit)Fix the issue of duplicate addition of wal path when encouter exception (apache#28691)

* [fix](compaction) compaction should catch exception when vertical block reader read next block (apache#28625)

* [fix](Nereids) join order is not right after sql parsing (apache#28721)

for sql
```
t1, t2 join t3
```

we should generate plan like:
```
t1 join (t2 join t3)
```

but we generate:
```
(t1 join t2) join t3
```
to follow legancy planner.

* [fix](metrics) fix bvar memtable_input_block_allocated_size (apache#28725)

* [enhancement](audit-log) add status code and error msg to audit log for proxy stmt (apache#27607)

* [improvement](external catalog)Change log level to debug for getRowCountFromExternalSource. (apache#28801)

* [fix](partial update) only report error when in strict mode partial update when finding missing rowsets during flushing memtable (apache#28764)

related pr: apache#28062, apache#28674, apache#28677
fix apache#28677

* [fix](function) Fix from_second functions overflow and wrong result (apache#28685)

* [Enhancement](load) Limit the number of incorrect data drops and add documents (apache#27727)

In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk.

Be familiar with the usage of doris' import function and internal implementation process
Add a new be configuration item load_error_log_limit_bytes = default value 200MB
Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk
Write regression cases for testing and verification

Co-authored-by: xy720 <[email protected]>

* [test](partial update) add complex type regression cases for partial update (apache#28758)

NOTE: There's some issue for MAP type to work with row store, so in this PR we don't have cases for MAP type
Will add the support for MAP type in future.

* [fix](test)fix test_create_table test case for nereids (apache#28693)

* [improve](config) set mutable and masterOnly in FE config stream_load_default_memtable_on_sink_node (apache#28835)

* [fix](group-commit) check if wal need recovery is abnormal (apache#28769)

* [Fix](Variant) fix variant predicate rewrite OrToIn with wrong plan (apache#28695)

using the name without paths info will lead to wrong In plan, e.g.
```
where cast(v:a as text) = 'hello' or cast(v:b as text) = 'world'
```
will be rewrite to:
```
where cast(v as text) in ('hello', 'world')
``
This is wrong, because they are different slots

* [refactor](buffer) remove download buffer since it is no longer useful (apache#28832)

remove download buffer since it is no longer useful

* [Feature](Variant) Implement variant new sub column access method (apache#28484)

* [Feature](Variant) Implement variant new sub column access method

The query SELECT v["a"]["b"] from simple_var WHERE cast(v["a"]["b"] as int) = 1 encompasses three primary testing scenarios:

```
1. A basic test involving the variant data type.
2. A scenario dealing with GitHub event data in the context of a variant.
3. A case related to the TPC-H benchmark using a variant.
```

* [fix](memory) Add thread asynchronous purge jemalloc dirty pages (apache#28655)

jemallctl purge all arena dirty pages may take several seconds, which will block memory GC and cause OOM.
So purge asynchronously in a thread.

* [fix](regression) fix regression error of test_compress_type (apache#28826)

* [improvement](executor) Add tvf and regression test for Workload Scheduler (apache#28733)

1 Add select workload schedule policy tvf
2 Add reg test

* [chore](error msg) print type info when colocate with ddl failed due to type mismatch (apache#28773)

* [opt](query cancel) optimization for query cancel apache#28778

* [Fix](multi-catalog) skip hms events if hms table is not supported. (apache#28644)

Co-authored-by: wangxiangyu <[email protected]>

* [Enhancement](job) No need to query some backends which are not alive. (apache#28608)

No need to execute some jobs if backend is not alive

* [fix](paimon)fix type convert for paimon (apache#28774)

fix type convert for paimon

* [fix](stream load)add test case and doc for arrow type of stream load (apache#28098)

add test case and doc for arrow type of stream load

* [feature](mtmv)mtmv partition refresh case (apache#28787)

* (enhance)(InternalQuery) Support to collect profile for intenal query (apache#28762)

* [optimize](count) optimize pk exact query without reading data (apache#28494)

* [opt](sessionVar)show changed sessoin var first apache#28840

“show variables” command list changed vars before not changed vars,

* check  stats and log memo for ds46 (apache#28396)

* [bug](sharedscan) Fix shared scan bug (apache#28841)

* [fix](hash join) fix stack overflow caused by evaluate case expr on huge build block (apache#28851)

* [refactor](executor)remove scan group apache#28847

* [pipelineX](refactor) rename functions (apache#28846)

* [Feature](inverted index) add lowercase option for inverted index analyzer (apache#28704)

* [feature](Nereids) Pull up join from union all (apache#28682)

* [fix](ci) tpch pipeline should not re-load data (apache#28874)

* [fix](ci) tpch pipeline should not re-load data

* 2

---------

Co-authored-by: stephen <[email protected]>

* [exec](compress) use FragmentTransmissionCompressionCodec control the exchange compress behavior (apache#28818)

* [improve](move-memtable) tweak load stream flush token num and max tasks (apache#28884)

* [improve](load) remove extra layer of heavy work pool in tablet_writer_add_block (apache#28550)

* [improve](load) limit delta writer flush task parallelism (apache#28883)

* [fix](multi-catalog)filter impala generated path (apache#28786)

file impala generated dir _imapala_insert_staging

* [enhancement](udf) add prepare function for java-udf (apache#28750)

* [refactor](test)Refactor workload group/schedule policy test apache#28888

[refactor](test)Refactor workload group/schedule policy test
apache#28888

* [improve](move-memtable) avoid using heavy work pool during append data (apache#28745)

* [enhancement](broker-load) fix-move-memtable-session-var-for-s3 (apache#28894)

* [exec](load) change default parallel num from 1 to 8 in no pipeline exec engine (apache#28864)

* [fix](segcompaction) disable segcompaction by default (apache#28906)

Signed-off-by: freemandealer <[email protected]>

* [fix](pipelineX) fix cannot runtime obtain profile on pipelineX apache#28795

* [fix](mtmv) fix failed to specify the number of buckets when bucket auto (apache#28854)

Issue Number: close #xxx

- fix failed to specify the number of buckets when bucket auto
- delete unused SessionVariable
- if mtmv used external table ,check `isMaterializedViewRewriteEnableContainForeignTable`

* [bugfix](scannercore) scanner will core in deconstructor during collect profile (apache#28727)

* [FIX](map)fix map with rowstore table (apache#28877)

* [feature](mtmv)MTMV pause and resume (apache#28887)

- PAUSE MATERIALIZED VIEW JOB ON mv1
- RESUME MATERIALIZED VIEW JOB ON mv1
- fix when drop db,not drop job
- add lock for one materialized view can only run one task at a time

* [feature](mtmv)add more test case1 (apache#28910)

* [fix](block) add block columns size dcheck (apache#28539)

* [fix](chore) update dcheck to avoid core during stress test (apache#28895)

* [fix](doc) Add the usage example of bos to the documentation of s3 tvf (apache#28899)

* [fix](mtmv)fix can not create mtmv all use default value (apache#28922)

* [fix](parquet) the end offset of column chunk may be wrong in parquet metadata (apache#28891)

* [fix](paimon)fix `like` predicate (apache#28803)

fix like predict

* [fix](merge-on-write) migration may cause duplicate keys for mow table (apache#28923)

* [fix](mtmv) Related partition exclude null generate column when increment build materialized view (apache#28855)

Infer partition column by materialized view partition column, exclude null generate column in join when increment build materialized view

* [nereids] fix join fd computing bug (apache#28849)

* [Fix](statistics) Fix partition name NPE and sample for all table during auto analyze (apache#28916)

Fix partition name NPE and sample for all table during auto analyze.
Sample for all tables because getData may have latency, which may cause full analyze a huge table and use too much resource. Sample for all tables to avoid this. Will improve the strategy later.

* [fix](nereids) Fix query mv rewrite fail when mv cache build quickly (apache#28876)

* [optimize](zonemap) skip zonemap if predicate does not support_zonemap (apache#28595)

* [optimize](zonemap) skip zonemap if predicate does not support_zonemap apache#27608 (apache#28506)

* Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (apache#28727)" (apache#28931)

This reverts commit 4066de3.

* [fix] (nereids) Catch exception when mv fail and fix the npe (apache#28932)

* [fix](regression-test) test_partial_update_native_insert_stmt_complex is flaky (apache#28927)

* [improvement](group commit) make get column function more reliable when replaying wal (apache#28900)

* [opt](nereids) convert or to inpredicate and optimize inpredicate partition prune (apache#28316)

* [enhancement](bulk-load) cancel loading tasks directly without retrying when timeout exceeded (apache#28666)

* [chore](test) Add testing util sync point (apache#28924)

* [chore](prompt) Optimize tablet and replica prompt by pointing out what the numbers mean (apache#28925)

* [fix](join) incorrect result of left semi/anti join with empty build side (apache#28898)

* [docs] (DebugPoints) Update docs about Debug Points (apache#28347)


---------

Co-authored-by: qinhao <[email protected]>

* [feature](api) add profile text api (apache#28697)

* [opt](invert index) Empty strings are not written to the index in the case of TOKENIZED (apache#28822)

* [feature](pipelineX) control exchange sink by memory usage (apache#28814)

* [fix](pipelineX) fix use global rf when there no shared_scans (apache#28869)

* [refactor](pipelineX)do not override dependency() function in pipelineX (apache#28848)

* [fix](log) regularise some BE error type and fix a load task check apache#28729

* [chore](test) correct create table statement (apache#28863)

* [fix](nereids)subquery unnest need handle subquery in Not expr correnctly (apache#28713)

* [feature](nereids)support decimalv2 (apache#28726)

* [fix](nereids)group by expr may be bound twice in bind agg slot (apache#28771)

* [doc](insert) Add group commit docs (apache#25949)

* [performance](variant) support topn 2phase read for variant column (apache#28318)

 [performance](variant) support topn 2phase read for variant column

* return residual expr of join (apache#28760)

* [fix](load) fix nullptr when getting memtable flush running count (apache#28942)

* [fix](load) fix nullptr when getting memtable flush running count

* style

* (enhance)(regression) Support `force_olap_table_replication_num=3` run test_insert_random_distribution_table case (apache#28903)

* [Test](Job)Add test case (apache#28481)

* [fix](doc) typo fix in auto-partition page (apache#28512)

* [feature](load) enable memtable on sink node by default (apache#28963)

* [Chore](Job)print log before task execute (apache#28962)

* [fix](hdfs) Fix HdfsFileSystem::exists_impl crash (apache#28952)

Calling hdfsGetLastExceptionRootCause without initializing ThreadLocalState
will crash. This PR modifies the condition for determining the existence of
a hdfs file, because hdfsExists will set errno to ENOENT when the file does
not exist, we can use this condition to check whether a file existence rather
than check the existence of the root cause.

* [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (apache#27943)

* [fix](doc) typo fix in dynamic-partition page (apache#28511)

* [fix](hash join) fix column ref DCHECK failure of hash join node block mem reuse (apache#28991)

Introduced by apache#28851, after evaluating build side expr, some columns in resulting block may be referenced more than once in the same block.

e.g. coalesce(col_a, 'string') if col_a is nullable but actually contains no null values, in this case funcition coalesce will insert a new nullable column which references the original col_a.

* [opt](assert_num_rows) support filter in AssertNumRows operator and fix some explain (apache#28935)

* NEED

* Update pipeline x

* fix pipelinex compile

* [fix](parquet_reader) misused bool pointer (apache#28986)

Signed-off-by: pengyu <[email protected]>

* [doc](multi-catalog)add krb and some ertificates FAQs (apache#28858)

add some security docs

* [fix](planner)should save original select list item before analyze (apache#28187)

* [fix](planner)should save original select list item before analyze

* fix test case

* fix failed case

* [bug](storage) Fix gc rowset bug (apache#28979)

* [fix](mtmv)add log for resolve pending task (apache#28999)

* add lock for resolve pending task

* add lock for resolve pending task

* [improvement](nereids) Get partition related table disable nullable field and complete agg matched pattern mv rules. (apache#28973)

* [improvement] (nereids) Get partition related table disable nullable field and modify regression test, complete agg mv rules.

* make filed not null to create partition mv

* [chore](config) modify `tablet_schema_cache_recycle_interval` from 24h to 1h (apache#28980)

To prevent from too many tablet schema cache in memory and lead to performance issue when hold lock
to erase item

* [opt](scanner) optimize the number of threads of scanners (apache#28640)

1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`

* [chore](config) modify `variant_ratio_of_defaults_as_sparse_column` from 0.95 to 1 (apache#28984)

since sparse column is not stable at present

* [feature](nereids)support partition property in nereids (apache#28982)

* [feature](Nereids) support table valued function http_stream (apache#29004)

* [feature](Load)(step2)support nereids load job schedule (apache#26356)

We will  Integrate new load job manager into  new job scheduling framework so that the insert into task can be scheduled after the broker load sql  is converted to insert into TVF(table value function) sql.

issue: apache#24221

Now support:
1. load data by tvf insert into sql, but just for simple load(columns need to be defined in the table)
2. show load stmt
- job id, label name, job state, time info
- simple progress
3. cancel load from db
4. support that enable new load through Config.enable_nereids_load
5. can replay job after restarting doris

TODO:
- support partition insert job
- support show statistics from BE
- support multiple task and collect task statistic
- support transactional task
- need add ut case

* [enhance](tablet) Reduce log in tablet meta (apache#28719)

* [fix](case) Add sync to test case (apache#29034)

* [opt](compound) Optimize by deleting the compound expr after obtaining the final result (apache#28934)

* [hotfix](jdbc catalog) fix load table and column names npe (apache#28865)

This fix is for the npe situation that occurs when FE of non-Master nodes initializes Jdbc Catalog metadata.

* [opt](Nereids) support cast bewteen numeric and  boolean in FE (apache#29006)

* [improvement](jdbc catalog) Optimize connection pool caching logic (apache#28859)

In the old caching logic, we only used jdbcurl, user, and password as cache keys. This may cause the old link to be still used when replacing the jar package, so we should concatenate all the parameters required for the connection pool as the key.

* [fix](Nereids) runtime filter push down failed (apache#28997)

project child not always NamedExpression

failed msg
```
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = class org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral cannot be cast to class org.apache.doris.nereids.trees.expressions.NamedExpression (org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral and org.apache.doris.nereids.trees.expressions.NamedExpression are in unnamed module of loader 'app')
    at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:623) ~[classes/:?]
    at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:478) ~[classes/:?]
    at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:457) ~[classes/:?]
    at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:245) ~[classes/:?]
    at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:166) ~[classes/:?]
    at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:193) ~[classes/:?]
    at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:246) ~[classes/:?]
    at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[classes/:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.ClassCastException: class org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral cannot be cast to class org.apache.doris.nereids.trees.expressions.NamedExpression (org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral and org.apache.doris.nereids.trees.expressions.NamedExpression are in unnamed module of loader 'app')
    at org.apache.doris.nereids.trees.plans.physical.PhysicalSetOperation.pushDownRuntimeFilter(PhysicalSetOperation.java:178) ~[classes/:?]
    at org.apache.doris.nereids.trees.plans.physical.PhysicalHashJoin.pushDownRuntimeFilter(PhysicalHashJoin.java:229) ~[classes/:?]
    at org.apache.doris.nereids.processor.post.RuntimeFilterGenerator.pushDownRuntimeFilterCommon(RuntimeFilterGenerator.java:386) ~[classes/:?]
```

* [fix](Nereids) generating function should not folding to NullLiteral (apache#29003)

should not fold table generating function to null when do constant folding.
we should remove Generate node and replaced it by project later.

* [improve](load) add profile for WaitFlushLimitTime (apache#29013)

* [opt](Nereids) let inverted index work with top opt (apache#29000)

* [feature](Nereids) support values inline table in query (apache#28972)

* [enhencement](config) change default memtable size & loadStreamPerNode & default load parallelism (apache#28977)

We change memtable size from 200MB to 100MB to achieve smoother flush
performance. We change loadStreamPerNode from 20 to 60 to avoid stream
rpc to be the bottleneck when enable memtable_on_sink_node. We change
default s3&broker load parallelsim to make the most of CPUs on moderm
multi-core systems.

Signed-off-by: freemandealer <[email protected]>

* [improve](move-memtable) increase load_stream_flush_token_max_tasks (apache#29011)

* [feature](scan) Implement parallel scanning by dividing the tablets based on the row range (apache#28967)

* [feature](scan) parallel scann on dup/mow mode

* fix bugs

* [refactor](create tablet) default create tablet round robin  (apache#28911)

* [Bug] optimize the collection import Lists dependency packge apache#28523 (apache#28579)

* [fix](regression) Fix flaky test test_partial_update_2pc_schema_change (apache#29037)

* [Bug](security) BE download_files function exists log print sensitive msg apache#28592 (apache#28594)

* Revert "[feature](load) enable memtable on sink node by default (apache#28963)" (apache#29090)

This reverts commit 17917a0.

* [improvement](statistics)Remove retry load when load stats cache fail (apache#28904)

Remove retry load when load stats cache fail. This case usually happens when BE is down or BE OOM, retry doesn't work in these cases and may increase BE work load.

* [fix](pipeline) sort_merge should throw exception in has_next_block if got failed status (apache#29076)

Test in regression-test/suites/datatype_p0/decimalv3/test_decimalv3_overflow.groovy::249 sometimes failed when there are multiple BEs and FE process report status slowly for some reason.

explain select k1, k2, k1 * k2 from test_decimal128_overflow2 order by 1,2,3
--------------

+----------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                            |
|   OUTPUT EXPRS:                                                                                                            |
|     k1[#5]                                                                                                                 |
|     k2[#6]                                                                                                                 |
|     (k1 * k2)[#7]                                                                                                          |
|   PARTITION: UNPARTITIONED                                                                                                 |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   VRESULT SINK                                                                                                             |
|      MYSQL_PROTOCAL                                                                                                        |
|                                                                                                                            |
|   111:VMERGING-EXCHANGE                                                                                                    |
|      offset: 0                                                                                                             |
|                                                                                                                            |
| PLAN FRAGMENT 1                                                                                                            |
|                                                                                                                            |
|   PARTITION: HASH_PARTITIONED: k1[#0], k2[#1]                                                                              |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   STREAM DATA SINK                                                                                                         |
|     EXCHANGE ID: 111                                                                                                       |
|     UNPARTITIONED                                                                                                          |
|                                                                                                                            |
|   108:VSORT                                                                                                                |
|   |  order by: k1[#5] ASC, k2[#6] ASC, (k1 * k2)[#7] ASC                                                                   |
|   |  offset: 0                                                                                                             |
|   |                                                                                                                        |
|   102:VOlapScanNode                                                                                                        |
|      TABLE: regression_test_datatype_p0_decimalv3.test_decimal128_overflow2(test_decimal128_overflow2), PREAGGREGATION: ON |
|      partitions=1/1 (test_decimal128_overflow2), tablets=8/8, tabletList=22841,22843,22845 ...                             |
|      cardinality=6, avgRowSize=0.0, numNodes=1                                                                             |
|      pushAggOp=NONE                                                                                                        |
|      projections: k1[#0], k2[#1], (k1[#0] * k2[#1])                                                                        |
|      project output tuple id: 1                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
36 rows in set (0.03 sec)
Why failed:

Multiple BEs
Fragments 0 and 1 are MUST on different BEs
Pipeline task of VOlapScanNode which executes k1*k2 failed sets query status to cancelled
Pipeline task of VSort call try close, send Cancelled status to VMergeExchange
sort_curso did not throw exception when it meets error

* [feature](Nereids): support infer join when comapring mv (apache#28988)

* [fix](nereids) "not is null" stats estimation fix (apache#28860)

* fix not is null stats

* [opt](nereids)expr normalize after filter pushdown (apache#28743)

normalize expression after filter push down

* [Enhancement](Wal)Support dynamic wal space limit (apache#27726)

* [fix](doc) spell errors fixes query profile docs (apache#28437)

* [opt](Nerieds) add infer props to expression (apache#28953)

* [fix](read) remove logic of estimating count of rows to read in segment iterator to avoid wrong result of unique key. (apache#29109)

* [fix](paimon) read batch by doris' batch size (apache#29039)

* [fix](mtmv)add log for resolve pending task (apache#29078)

* [Refact](inverted index) refactor inverted index writer init (apache#29072)

* [fix](paimon)Remove the static attribute of the source for paimon (apache#29032)

* [fix](planner) Fix delete statement on MOR unique table (apache#28968)

add conditions for mor table removed in apache#26776

* [fix][compile]unused variable (apache#28992)

* [Fix](WorkFlow)Auto replay not work

* [Fix](WorkFlow)Auto replay not work

* [Fix](WorkFlow)Auto replay not work

---------

Signed-off-by: nextdreamblue <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: pengyu <[email protected]>
Co-authored-by: lihangyu <[email protected]>
Co-authored-by: Kang <[email protected]>
Co-authored-by: jakevin <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: xueweizhang <[email protected]>
Co-authored-by: zhangguoqiang <[email protected]>
Co-authored-by: zhangdong <[email protected]>
Co-authored-by: airborne12 <[email protected]>
Co-authored-by: Ma1oneZhang <[email protected]>
Co-authored-by: ziyang zhang <[email protected]>
Co-authored-by: Siyang Tang <[email protected]>
Co-authored-by: Kaijie Chen <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: yiguolei <[email protected]>
Co-authored-by: minghong <[email protected]>
Co-authored-by: Jibing-Li <[email protected]>
Co-authored-by: Gavin Chou <[email protected]>
Co-authored-by: plat1ko <[email protected]>
Co-authored-by: wudongliang <[email protected]>
Co-authored-by: Nitin-Kashyap <[email protected]>
Co-authored-by: yiguolei <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: nanfeng <[email protected]>
Co-authored-by: Xinyi Zou <[email protected]>
Co-authored-by: qiye <[email protected]>
Co-authored-by: DuRipeng <[email protected]>
Co-authored-by: wuwenchi <[email protected]>
Co-authored-by: Qi Chen <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: Xin Liao <[email protected]>
Co-authored-by: yujun <[email protected]>
Co-authored-by: walter <[email protected]>
Co-authored-by: xy <[email protected]>
Co-authored-by: xingying01 <[email protected]>
Co-authored-by: seawinde <[email protected]>
Co-authored-by: Mryange <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: bobhan1 <[email protected]>
Co-authored-by: zzzxl <[email protected]>
Co-authored-by: yangshijie <[email protected]>
Co-authored-by: zclllyybb <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: xy720 <[email protected]>
Co-authored-by: Gabriel <[email protected]>
Co-authored-by: htyoung <[email protected]>
Co-authored-by: Pxl <[email protected]>
Co-authored-by: Luwei <[email protected]>
Co-authored-by: meiyi <[email protected]>
Co-authored-by: Ashin Gau <[email protected]>
Co-authored-by: HHoflittlefish777 <[email protected]>
Co-authored-by: 谢健 <[email protected]>
Co-authored-by: XuJianxu <[email protected]>
Co-authored-by: 胥剑旭 <[email protected]>
Co-authored-by: Yoko <[email protected]>
Co-authored-by: huanghaibin <[email protected]>
Co-authored-by: lw112 <[email protected]>
Co-authored-by: zhannngchen <[email protected]>
Co-authored-by: Xiangyu Wang <[email protected]>
Co-authored-by: wangxiangyu <[email protected]>
Co-authored-by: AlexYue <[email protected]>
Co-authored-by: TengJianPing <[email protected]>
Co-authored-by: xzj7019 <[email protected]>
Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: stephen <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: slothever <[email protected]>
Co-authored-by: zhangstar333 <[email protected]>
Co-authored-by: zhengyu <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: HowardQin <[email protected]>
Co-authored-by: qinhao <[email protected]>
Co-authored-by: deardeng <[email protected]>
Co-authored-by: caiconghui <[email protected]>
Co-authored-by: py023 <[email protected]>
Co-authored-by: zy-kkk <[email protected]>
Co-authored-by: Guangming Lu <[email protected]>
Co-authored-by: abmdocrt <[email protected]>
hello-stephen pushed a commit to hello-stephen/doris that referenced this pull request Dec 28, 2023
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants