Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[branch-2.0](migrate disk) fix migrate disk lost data during publish version #29887 #30546

Merged

Conversation

yujun777
Copy link
Collaborator

pick: #29887

Tablet migrate disk may cause lost data. For example, suppose the old tablet's version is 10, it's just publish txn on version 11. At then comes a migrate disk task. After the old tablet just finish publish txn 11, but before it add rowset( version = 11). For the migrating task, the old tablet had no txn now, so the new tablet will copy the old tablet's data(old tablet's version is still 10). After copying, it will delete the old tablet, and replace it with the new tablet.
So data whose version is 11 will be lost.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -19,9 +19,11 @@

#include <atomic>
#include <boost/lexical_cast.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'boost/lexical_cast.hpp' file not found [clang-diagnostic-error]

#include <boost/lexical_cast.hpp>
         ^

@yujun777
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 50228 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c193663895f68a4541eb40250797bdd245b78422, data reload: false

------ Round 1 ----------------------------------
q1	17611	4516	4418	4418
q2	2064	150	139	139
q3	10349	1921	1943	1921
q4	10312	1249	1329	1249
q5	8684	3965	4021	3965
q6	232	124	121	121
q7	2026	1605	1642	1605
q8	9293	2734	2736	2734
q9	10892	10627	10504	10504
q10	8612	3566	3534	3534
q11	421	235	249	235
q12	455	294	305	294
q13	18353	3966	4056	3966
q14	353	316	327	316
q15	500	463	463	463
q16	704	612	609	609
q17	1139	956	978	956
q18	7363	6889	6866	6866
q19	1712	1561	1517	1517
q20	516	308	307	307
q21	4453	4191	4112	4112
q22	517	402	397	397
Total cold run time: 116561 ms
Total hot run time: 50228 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4353	4326	4300	4300
q2	318	223	217	217
q3	4173	4302	4166	4166
q4	2764	2765	2749	2749
q5	7307	7229	7234	7229
q6	239	120	116	116
q7	3203	2906	2825	2825
q8	4390	4477	4495	4477
q9	17060	17083	16963	16963
q10	4218	4276	4211	4211
q11	752	668	690	668
q12	1017	856	848	848
q13	6887	3739	3723	3723
q14	451	419	414	414
q15	504	454	450	450
q16	756	713	690	690
q17	3925	3888	3834	3834
q18	8849	8739	8792	8739
q19	1735	1715	1636	1636
q20	2399	2155	2114	2114
q21	8627	8561	8610	8561
q22	1040	919	930	919
Total cold run time: 84967 ms
Total hot run time: 79849 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 238100 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c193663895f68a4541eb40250797bdd245b78422, data reload: false

query1	942	400	385	385
query2	6515	2062	2068	2062
query3	6939	205	208	205
query4	20229	17970	17983	17970
query5	19711	6252	6241	6241
query6	272	220	236	220
query7	4144	301	308	301
query8	240	234	244	234
query9	3045	2636	2564	2564
query10	422	296	295	295
query11	11426	10603	10662	10603
query12	125	71	71	71
query13	5636	638	632	632
query14	18077	13148	13172	13148
query15	362	240	233	233
query16	6659	257	270	257
query17	3236	889	872	872
query18	2268	417	405	405
query19	200	146	149	146
query20	78	74	72	72
query21	200	96	98	96
query22	5296	5075	5068	5068
query23	32533	31828	31888	31828
query24	6831	6502	6544	6502
query25	511	427	420	420
query26	521	161	162	161
query27	1808	298	293	293
query28	6058	2222	2196	2196
query29	3067	2796	2914	2796
query30	244	165	157	157
query31	935	741	743	741
query32	66	58	59	58
query33	388	250	251	250
query34	848	462	484	462
query35	1124	910	935	910
query36	1512	1477	1410	1410
query37	89	62	62	62
query38	3092	2956	2914	2914
query39	1385	1314	1329	1314
query40	202	97	93	93
query41	33	32	33	32
query42	93	78	90	78
query43	680	612	634	612
query44	1111	705	710	705
query45	240	229	227	227
query46	1227	948	980	948
query47	2014	1804	1660	1660
query48	970	674	668	668
query49	604	358	360	358
query50	878	592	610	592
query51	5666	5512	5550	5512
query52	87	87	88	87
query53	447	329	318	318
query54	2560	2286	2278	2278
query55	90	84	88	84
query56	218	198	195	195
query57	1101	1252	1147	1147
query58	209	212	203	203
query59	3594	3219	3108	3108
query60	201	204	196	196
query61	87	90	85	85
query62	807	557	476	476
query63	474	345	337	337
query64	2040	1337	1311	1311
query65	3704	3663	3638	3638
query66	810	352	365	352
query67	16278	15439	18012	15439
query68	8605	671	664	664
query69	564	346	349	346
query70	1865	1747	1750	1747
query71	399	303	328	303
query72	4653	3374	3368	3368
query73	715	320	323	320
query74	6379	5828	5872	5828
query75	4371	3539	3483	3483
query76	4888	1194	1219	1194
query77	766	256	257	256
query78	32423	50714	47630	47630
query79	16471	638	658	638
query80	4978	384	400	384
query81	566	231	229	229
query82	1089	97	99	97
query83	455	142	132	132
query84	257	72	70	70
query85	2583	285	288	285
query86	487	351	403	351
query87	3283	3050	2983	2983
query88	7003	2333	2329	2329
query89	445	315	294	294
query90	2511	203	207	203
query91	156	115	117	115
query92	66	53	51	51
query93	6324	593	591	591
query94	1870	201	212	201
query95	1114	1088	1073	1073
query96	654	330	331	330
query97	6484	6279	6397	6279
query98	191	173	181	173
query99	3945	860	925	860
Total cold run time: 350019 ms
Total hot run time: 238100 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.85% (8036/21232)
Line Coverage: 29.55% (65485/221607)
Region Coverage: 28.97% (33635/116096)
Branch Coverage: 24.83% (17260/69502)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c193663895f68a4541eb40250797bdd245b78422_c193663895f68a4541eb40250797bdd245b78422/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c193663895f68a4541eb40250797bdd245b78422, data reload: false

query1	0.03	0.02	0.02
query2	0.06	0.02	0.02
query3	0.25	0.04	0.05
query4	1.83	0.06	0.07
query5	0.53	0.54	0.52
query6	1.30	0.62	0.62
query7	0.02	0.00	0.00
query8	0.03	0.02	0.02
query9	0.51	0.50	0.48
query10	0.55	0.54	0.55
query11	0.11	0.08	0.09
query12	0.11	0.09	0.09
query13	0.62	0.60	0.61
query14	0.80	0.78	0.80
query15	0.79	0.76	0.77
query16	0.38	0.37	0.37
query17	1.01	0.99	1.04
query18	0.23	0.27	0.22
query19	1.93	1.76	1.84
query20	0.02	0.01	0.01
query21	15.47	0.56	0.57
query22	1.70	2.32	1.77
query23	17.24	0.91	1.04
query24	7.25	1.58	1.57
query25	1.53	0.12	0.11
query26	0.39	0.14	0.13
query27	0.10	0.11	0.11
query28	5.83	0.71	0.71
query29	12.72	2.35	2.26
query30	0.59	0.50	0.51
query31	2.80	0.37	0.39
query32	3.43	0.50	0.49
query33	3.07	3.06	3.11
query34	15.25	4.82	4.79
query35	4.87	4.82	4.85
query36	1.05	1.03	1.02
query37	0.05	0.05	0.05
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.16	0.14	0.14
query41	0.06	0.01	0.02
query42	0.02	0.02	0.02
query43	0.02	0.02	0.01
Total cold run time: 104.76 s
Total hot run time: 31.42 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit c193663895f68a4541eb40250797bdd245b78422 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.9 seconds inserted 10000000 Rows, about 456K ops/s

@xiaokang xiaokang merged commit beeac85 into apache:branch-2.0 Jan 31, 2024
23 of 26 checks passed
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants