Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](fqdn) Add DNS Cache for FE and BE (#32869) #32995

Merged
merged 2 commits into from
Mar 29, 2024

Conversation

morningman
Copy link
Contributor

bp #32869

In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.

This PR mainly changes:

1. Add DNSCache for both FE and BE.
    The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
    Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
    and update the cache.
    In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
    be changed at anytime.

There are other implements of this dns cache:

1.  kaka11chen@36fed13
    This is for BE side, but it does not handle the IP change case.

3. apache#28479
    This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
    And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

return Status::OK();
}

void DNSCache::_refresh_cache() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_refresh_cache' can be made const [readability-make-member-function-const]

Suggested change
void DNSCache::_refresh_cache() {
void DNSCache::_refresh_cache() const {

be/src/util/dns_cache.h:46:

-     void _refresh_cache();
+     void _refresh_cache() const;

// update cache at fix internal
void _refresh_cache();

private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
private:
Additional context

be/src/util/dns_cache.h:40: previously declared here

private:
^

@doris-robot
Copy link

TPC-H: Total hot run time: 50557 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be3d7da8904103817b3dcd3e1f169930cf0ea10a, data reload: false

------ Round 1 ----------------------------------
q1	17734	4552	4435	4435
q2	2046	151	146	146
q3	10564	1933	1950	1933
q4	10251	1254	1357	1254
q5	8759	4065	4025	4025
q6	231	122	125	122
q7	2046	1593	1592	1592
q8	9296	2752	2772	2752
q9	11179	10853	10823	10823
q10	8673	3547	3514	3514
q11	422	241	234	234
q12	466	295	301	295
q13	18374	3985	4048	3985
q14	354	325	335	325
q15	516	464	453	453
q16	708	604	597	597
q17	1144	983	949	949
q18	7334	6805	6973	6805
q19	1698	1602	1505	1505
q20	526	319	295	295
q21	4528	4146	4126	4126
q22	518	392	396	392
Total cold run time: 117367 ms
Total hot run time: 50557 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4395	4338	4320	4320
q2	323	226	223	223
q3	4186	4130	4173	4130
q4	2790	2745	2775	2745
q5	7389	7262	7235	7235
q6	239	121	119	119
q7	3236	2863	2816	2816
q8	4427	4503	4521	4503
q9	17731	17226	17350	17226
q10	4247	4255	4257	4255
q11	774	723	702	702
q12	1023	837	849	837
q13	6923	3730	3704	3704
q14	463	430	422	422
q15	502	449	463	449
q16	753	714	707	707
q17	3935	3880	3913	3880
q18	8836	8878	8825	8825
q19	1742	1700	1695	1695
q20	2422	2191	2147	2147
q21	8611	8607	8543	8543
q22	1065	960	945	945
Total cold run time: 86012 ms
Total hot run time: 80428 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.80% (8048/21293)
Line Coverage: 29.47% (65729/223073)
Region Coverage: 28.93% (33827/116925)
Branch Coverage: 24.79% (17370/70080)
Coverage Report: http://coverage.selectdb-in.cc/coverage/be3d7da8904103817b3dcd3e1f169930cf0ea10a_be3d7da8904103817b3dcd3e1f169930cf0ea10a/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 203563 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be3d7da8904103817b3dcd3e1f169930cf0ea10a, data reload: false

query1	937	394	384	384
query2	6534	2145	2147	2145
query3	6914	200	197	197
query4	21204	18211	18106	18106
query5	19722	6557	6536	6536
query6	291	224	230	224
query7	4157	320	324	320
query8	293	257	274	257
query9	3238	2785	2711	2711
query10	435	281	329	281
query11	11315	10761	10855	10761
query12	124	86	74	74
query13	5582	660	638	638
query14	17861	13560	13716	13560
query15	359	243	232	232
query16	6432	270	253	253
query17	1724	1445	871	871
query18	2325	416	432	416
query19	214	149	149	149
query20	79	78	78	78
query21	183	97	98	97
query22	5313	4980	4957	4957
query23	32826	32063	32227	32063
query24	6998	6533	6577	6533
query25	542	432	414	414
query26	528	163	163	163
query27	1861	309	305	305
query28	5985	2266	2233	2233
query29	2994	2777	2683	2683
query30	244	164	160	160
query31	925	766	715	715
query32	64	58	58	58
query33	400	255	242	242
query34	848	474	504	474
query35	1141	924	938	924
query36	1314	1221	1073	1073
query37	89	62	64	62
query38	3128	2937	2946	2937
query39	1374	1331	1330	1330
query40	200	99	101	99
query41	36	34	33	33
query42	82	79	89	79
query43	574	563	617	563
query44	1163	720	725	720
query45	241	227	229	227
query46	1237	985	982	982
query47	1832	1830	1870	1830
query48	984	697	703	697
query49	615	378	388	378
query50	873	611	632	611
query51	4785	4666	4696	4666
query52	101	81	86	81
query53	445	321	320	320
query54	2690	2501	2515	2501
query55	87	85	83	83
query56	219	217	210	210
query57	1241	1108	1080	1080
query58	209	213	197	197
query59	3568	3238	3176	3176
query60	236	192	204	192
query61	85	83	86	83
query62	864	508	489	489
query63	475	343	343	343
query64	2550	1531	1465	1465
query65	3666	3565	3558	3558
query66	765	373	370	370
query67	17007	15844	17324	15844
query68	7950	675	688	675
query69	592	364	373	364
query70	1609	1371	1466	1371
query71	404	301	316	301
query72	6549	3491	3412	3412
query73	733	330	313	313
query74	6352	5920	5914	5914
query75	4590	3818	3672	3672
query76	4546	1144	1239	1144
query77	554	254	262	254
query78	12886	11812	12698	11812
query79	7971	662	663	662
query80	1313	407	392	392
query81	508	233	244	233
query82	779	102	99	99
query83	171	143	124	124
query84	256	72	71	71
query85	1117	289	289	289
query86	335	301	328	301
query87	3322	2999	3048	2999
query88	4906	2319	2328	2319
query89	358	290	291	290
query90	1776	211	216	211
query91	157	120	122	120
query92	54	54	57	54
query93	3487	613	601	601
query94	729	203	200	200
query95	1136	1084	1081	1081
query96	631	320	325	320
query97	6587	6397	6428	6397
query98	194	177	162	162
query99	2934	1039	913	913
Total cold run time: 308813 ms
Total hot run time: 203563 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be3d7da8904103817b3dcd3e1f169930cf0ea10a, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.02	0.03
query3	0.25	0.05	0.05
query4	1.78	0.07	0.08
query5	0.54	0.52	0.53
query6	1.25	0.62	0.62
query7	0.01	0.01	0.01
query8	0.04	0.02	0.02
query9	0.53	0.49	0.49
query10	0.55	0.52	0.55
query11	0.12	0.08	0.08
query12	0.12	0.09	0.09
query13	0.62	0.61	0.62
query14	0.77	0.80	0.81
query15	0.80	0.77	0.78
query16	0.40	0.37	0.40
query17	1.01	1.00	1.03
query18	0.22	0.27	0.25
query19	1.92	1.86	1.83
query20	0.02	0.01	0.01
query21	15.50	0.56	0.56
query22	2.33	2.24	1.62
query23	17.03	0.97	0.94
query24	6.78	0.57	0.75
query25	0.40	0.08	0.07
query26	0.71	0.15	0.16
query27	0.03	0.04	0.03
query28	7.05	0.80	0.71
query29	12.64	2.26	2.22
query30	0.63	0.59	0.55
query31	2.82	0.39	0.38
query32	3.38	0.49	0.50
query33	3.10	3.09	3.05
query34	15.25	4.82	4.81
query35	4.93	4.87	4.85
query36	1.05	1.02	1.01
query37	0.06	0.05	0.05
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.15	0.14	0.14
query41	0.07	0.01	0.01
query42	0.02	0.01	0.02
query43	0.02	0.02	0.02
Total cold run time: 105.04 s
Total hot run time: 30.42 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit be3d7da8904103817b3dcd3e1f169930cf0ea10a with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       20.8 seconds inserted 10000000 Rows, about 480K ops/s

@morningman morningman merged commit ea048ee into apache:branch-2.0 Mar 29, 2024
24 of 26 checks passed
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants