Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one cluster should have only one transport to reduce the number of TCP connections #5615

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lcw2
Copy link

@lcw2 lcw2 commented Sep 27, 2024

What type of PR is this?

What this PR does / why we need it:
It can solve the problem about too many tcp connection between aggregated apiserver and apiserver of member's cluster
Which issue(s) this PR fixes:
Fixes #5574

Special notes for your reviewer:

截屏2024-09-27 10 46 40

test.sh

KUBECONFIG="/root/.kube/karmada.config"
SLEEP_INTERVAL=0.1
MAX_JOBS=100
function run_karmadactl() {
  for ((i = 1; i <= 50; i++)); do
    karmadactl  --kubeconfig "$KUBECONFIG" get node --operation-scope=members
    sleep "$SLEEP_INTERVAL"
  done
}
for ((i = 1; i <= MAX_JOBS; i++)); do
  run_karmadactl &
done
wait

result.sh

#!/bin/bash

while true
do
  tcp_count=$(netstat -anp | grep 6443| wc -l)
  sleep 1
  echo "$(date '+%Y-%m-%d %H:%M:%S') - Current total TCP connections: $((tcp_count))"
done

fix before
企业微信截图_f9386612-1d3f-402c-8aee-243149a67406

fix after
企业微信截图_ace06070-69d8-4391-853f-ba3a488df8f2

@karmada-bot
Copy link
Collaborator

Welcome @lcw2! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 27, 2024
@lcw2 lcw2 force-pushed the transport branch 2 times, most recently from 9289780 to ee350be Compare September 27, 2024 09:46
@karmada-bot karmada-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 27, 2024
@lcw2 lcw2 force-pushed the transport branch 2 times, most recently from 5b909be to 423abb3 Compare September 27, 2024 09:55
@lcw2 lcw2 changed the title one cluster have only one transport one cluster should have only one transport to reduce the number of TCP connections Sep 27, 2024
@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 27, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 68.96552% with 9 lines in your changes missing coverage. Please review.

Project coverage is 38.28%. Comparing base (58612d3) to head (884052c).
Report is 76 commits behind head on master.

Files with missing lines Patch % Lines
pkg/util/proxy/proxy.go 68.96% 5 Missing and 4 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5615      +/-   ##
==========================================
+ Coverage   35.20%   38.28%   +3.07%     
==========================================
  Files         645      649       +4     
  Lines       44869    45160     +291     
==========================================
+ Hits        15795    17288    +1493     
+ Misses      27844    26559    -1285     
- Partials     1230     1313      +83     
Flag Coverage Δ
unittests 38.28% <68.96%> (+3.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

clusterEndpoint := clusterEndpointInfo{
Transport: proxyTransport,
}
clusterEndpointInfoStore.Store(cluster.UID, clusterEndpointInfo{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: When will the clusterEndpointInfo be removed from the clusterEndpointInfoStore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -174,7 +175,6 @@ require (
golang.org/x/exp v0.0.0-20231226003508-02704c960a9b // indirect
golang.org/x/mod v0.17.0 // indirect
golang.org/x/oauth2 v0.18.0 // indirect
golang.org/x/sync v0.7.0 // indirect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that the go.mod file has not changed substantially. you need to restore it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

difference in indirect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary.

@lcw2
Copy link
Author

lcw2 commented Oct 10, 2024

two questions:

  1. when ProxyURL and ProxyHeader of cluster changed, how to handle transport ?
    store ProxyURL and ProxyHeader in clusterEndpointInfo, when a request comes in, compare the current values with the stored ones. If those fields have changed, create a new transport ? or other idea?

  2. when cluster is deleted, should the transport of the cluster also be deleted?
    If deletion is required, can this be archieved by periodically list cluster? 30s?
    any suggestion? @whitewindmills @chaunceyjiang @chaunceyjiang

@whitewindmills
Copy link
Member

I just feel like this approach seems too complicated and easy to go wrong. can we consider using all key fields(such as APIEndpoint, ProxyURL ...) to generate a hash value as the clusterEndpointInfoStore key? and we might do not need to remove it when cluster is deleted.

@lcw2 lcw2 force-pushed the transport branch 3 times, most recently from 60a61a6 to 63a7ba2 Compare October 13, 2024 18:57
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chaunceyjiang for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@XiShanYongYe-Chang
Copy link
Member

/cc @whitewindmills @chaunceyjiang @yanfeng1992
Hi, guys, can you help take a review again?

for _, key := range proxyHeaderKeys {
usedFields = append(usedFields, key, cluster.Spec.ProxyHeader[key])
}
usedFields = append(usedFields, string(cluster.UID), cluster.Spec.ProxyURL, cluster.Spec.APIEndpoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is UID used by createProxyTransport?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UID changed means the cluster rejoined, should create a new transport.

Copy link
Member

@whitewindmills whitewindmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested it yet? I thought of a scenario: when the cluster goes offline, the connection will be automatically disconnected after a period of time. when the cluster comes back online, can it work fine?

if value, ok := clusterEndpointInfoStore.Load(clusterHash); ok {
return value, nil
}
proxyTransport, err := createProxyTransport(cluster, tlsConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if tlsConfig changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not work fine, cannot request successfully. But I think no need to consider the scenario abourt secrets changes.

@lcw2
Copy link
Author

lcw2 commented Oct 17, 2024

have you tested it yet? I thought of a scenario: when the cluster goes offline, the connection will be automatically disconnected after a period of time. when the cluster comes back online, can it work fine?

I stop the apiserver two minutes ,and restart the apiserver, karmadactl get nodes work fine.

@whitewindmills
Copy link
Member

I'm not sure if you built the cache before stoping apiserver. maybe we haven't considered it comprehensively enough. can we talk about it in the next community meeting?

@lcw2
Copy link
Author

lcw2 commented Oct 17, 2024

I'm not sure if you built the cache before stoping apiserver. maybe we haven't considered it comprehensively enough. can we talk about it in the next community meeting?

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many duplicated tcp connections from aggregated apiserver to one member cluster
7 participants