Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor](inverted index) add analyzer for inverted index to unify analysis process #40758

Merged
merged 2 commits into from
Sep 13, 2024

Conversation

airborne12
Copy link
Member

Proposed changes

The analyzer logic is currently distributed across the read, write, and no-index match phases of the inverted index. In this PR, we introduce an inverted index analyzer to unify these processes.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@airborne12
Copy link
Member Author

run buildall

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.91% (9464/25642)
Line Coverage: 28.27% (77843/275314)
Region Coverage: 27.68% (40193/145218)
Branch Coverage: 24.29% (20423/84090)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a0b94ba0e693a405368dab9549e69c7f6874f375_a0b94ba0e693a405368dab9549e69c7f6874f375/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 43082 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c314c82fdf311a4d7d31eb6616d5f0aa14d081e0, data reload: false

------ Round 1 ----------------------------------
q1	13243	7627	7224	7224
q2	1089	186	185	185
q3	4196	1311	1355	1311
q4	3570	1063	1034	1034
q5	3310	3173	3171	3171
q6	241	149	149	149
q7	1043	626	633	626
q8	3434	2012	2040	2012
q9	6385	6356	6311	6311
q10	3892	2590	2531	2531
q11	419	258	259	258
q12	400	219	227	219
q13	7760	3022	3051	3022
q14	289	254	261	254
q15	585	550	527	527
q16	522	424	435	424
q17	977	958	949	949
q18	7737	6862	6765	6765
q19	1295	1254	1239	1239
q20	597	338	328	328
q21	3905	3571	3554	3554
q22	1086	989	1036	989
Total cold run time: 65975 ms
Total hot run time: 43082 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7182	7153	7135	7135
q2	337	236	229	229
q3	2970	2943	2934	2934
q4	1986	1983	1990	1983
q5	5689	5461	5455	5455
q6	233	143	144	143
q7	2088	1692	1702	1692
q8	3260	3342	3350	3342
q9	8497	8475	8435	8435
q10	3412	3488	3476	3476
q11	590	495	470	470
q12	806	560	575	560
q13	4096	3038	3008	3008
q14	313	264	267	264
q15	568	528	517	517
q16	488	443	453	443
q17	1780	1751	1719	1719
q18	8107	7756	7628	7628
q19	1728	1713	1710	1710
q20	2022	1793	1805	1793
q21	5682	5486	5410	5410
q22	1108	1017	988	988
Total cold run time: 62942 ms
Total hot run time: 59334 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.91% (9464/25642)
Line Coverage: 28.26% (77810/275314)
Region Coverage: 27.67% (40186/145217)
Branch Coverage: 24.28% (20414/84090)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c314c82fdf311a4d7d31eb6616d5f0aa14d081e0_c314c82fdf311a4d7d31eb6616d5f0aa14d081e0/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 195784 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c314c82fdf311a4d7d31eb6616d5f0aa14d081e0, data reload: false

query1	917	396	395	395
query2	6495	1749	1819	1749
query3	6668	212	220	212
query4	26949	23861	23856	23856
query5	4915	554	548	548
query6	255	172	163	163
query7	4590	307	300	300
query8	283	213	224	213
query9	8578	2583	2584	2583
query10	470	301	292	292
query11	16017	15649	15509	15509
query12	161	101	100	100
query13	1685	408	393	393
query14	11679	6993	7223	6993
query15	224	174	179	174
query16	7625	475	465	465
query17	1552	598	577	577
query18	2037	298	293	293
query19	203	149	154	149
query20	133	112	113	112
query21	213	106	107	106
query22	4550	4416	4156	4156
query23	34606	33981	33931	33931
query24	9619	3169	3172	3169
query25	679	419	427	419
query26	973	164	161	161
query27	2172	289	291	289
query28	6364	2102	2076	2076
query29	963	432	445	432
query30	300	162	166	162
query31	1014	782	819	782
query32	107	59	62	59
query33	721	319	316	316
query34	929	496	482	482
query35	881	776	718	718
query36	1081	927	915	915
query37	150	88	84	84
query38	4013	3972	3862	3862
query39	1494	1394	1413	1394
query40	215	123	124	123
query41	51	49	50	49
query42	119	98	101	98
query43	490	451	461	451
query44	1312	820	791	791
query45	207	176	175	175
query46	1119	877	858	858
query47	1891	1778	1780	1778
query48	382	296	305	296
query49	1131	495	465	465
query50	955	455	448	448
query51	7156	7123	6850	6850
query52	103	91	91	91
query53	273	195	191	191
query54	744	481	483	481
query55	84	77	78	77
query56	307	281	284	281
query57	1218	1078	1072	1072
query58	250	258	260	258
query59	2781	2453	2734	2453
query60	327	303	291	291
query61	123	238	105	105
query62	925	673	665	665
query63	225	194	187	187
query64	4203	673	663	663
query65	3276	3204	3202	3202
query66	1019	295	301	295
query67	16065	15480	15492	15480
query68	3218	900	883	883
query69	451	332	342	332
query70	1151	1155	1100	1100
query71	364	355	360	355
query72	6026	3426	3359	3359
query73	613	607	597	597
query74	9315	9171	9029	9029
query75	3187	2986	3038	2986
query76	1944	912	907	907
query77	474	425	418	418
query78	9412	9281	9271	9271
query79	939	909	917	909
query80	880	860	848	848
query81	460	272	271	271
query82	270	266	270	266
query83	198	197	195	195
query84	238	119	109	109
query85	659	423	407	407
query86	311	333	318	318
query87	4326	4312	4457	4312
query88	4275	4229	4206	4206
query89	388	382	378	378
query90	1397	346	330	330
query91	130	129	124	124
query92	78	78	80	78
query93	1074	1076	1059	1059
query94	623	375	407	375
query95	485	443	434	434
query96	492	488	488	488
query97	3134	3175	3147	3147
query98	230	237	230	230
query99	1564	1306	1300	1300
Total cold run time: 277522 ms
Total hot run time: 195784 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c314c82fdf311a4d7d31eb6616d5f0aa14d081e0, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.05	0.06
query4	1.69	0.06	0.07
query5	0.50	0.50	0.50
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.57	0.52	0.51
query10	0.57	0.60	0.56
query11	0.16	0.12	0.12
query12	0.15	0.13	0.13
query13	0.63	0.61	0.61
query14	1.45	1.47	1.44
query15	0.91	0.86	0.90
query16	0.37	0.37	0.37
query17	1.07	1.03	1.09
query18	0.23	0.20	0.20
query19	1.98	1.80	1.77
query20	0.01	0.01	0.01
query21	15.40	0.69	0.68
query22	4.07	7.83	1.59
query23	17.88	1.35	1.31
query24	2.32	0.22	0.22
query25	0.18	0.08	0.08
query26	0.28	0.18	0.17
query27	0.08	0.07	0.07
query28	13.17	1.13	1.09
query29	12.53	3.40	3.37
query30	0.24	0.06	0.06
query31	2.86	0.43	0.41
query32	3.25	0.49	0.49
query33	3.04	3.05	3.04
query34	15.43	4.34	4.34
query35	4.34	4.34	4.34
query36	0.69	0.49	0.48
query37	0.19	0.16	0.16
query38	0.16	0.15	0.15
query39	0.05	0.04	0.04
query40	0.16	0.13	0.14
query41	0.09	0.05	0.05
query42	0.06	0.06	0.05
query43	0.05	0.04	0.04
Total cold run time: 108.36 s
Total hot run time: 31.44 s

Copy link
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 3e65fab into apache:master Sep 13, 2024
25 of 30 checks passed
@airborne12 airborne12 deleted the refactor branch September 13, 2024 06:03
dataroaring pushed a commit that referenced this pull request Oct 9, 2024
…nalysis process (#40758)

## Proposed changes

The analyzer logic is currently distributed across the read, write, and
no-index match phases of the inverted index. In this PR, we introduce an
inverted index analyzer to unify these processes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants