Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ingress handling #197

Open
jacekn opened this issue Aug 23, 2024 · 0 comments
Open

Improve ingress handling #197

jacekn opened this issue Aug 23, 2024 · 0 comments
Labels
core-team Issue can be worked on by the core team ops-team Issue can be worked on by the ops team

Comments

@jacekn
Copy link
Contributor

jacekn commented Aug 23, 2024

What problem does your feature solve?

We currently see very large coreDNS traffic during some missions. For example small tests can get us to 80k or even 100k rps at which point CoreDNS starts failing

What happens is this:

  1. When SSC starts we create service that catches all pods. For example
apiVersion: v1
kind: Service
metadata:
  labels:
    app: stellar-core
  name: ssc-1015z-15a2d2-stellar-core
  namespace: stellar-supercluster
spec:
  clusterIP: None
  selector:
    app: stellar-core
  1. We also create service of type ExternalName for each pod:
apiVersion: v1
kind: Service
metadata:
  labels:
    app: stellar-core
  name: ssc-1015z-15a2d2-sts-complete1-0
  namespace: stellar-supercluster
spec:
  externalName: ssc-1015z-15a2d2-sts-complete1-0.ssc-1015z-15a2d2-stellar-core.stellar-supercluster.svc.cluster.local
  ports:
  - name: core
    port: 11626
    protocol: TCP
    targetPort: 11626
  - name: history
    port: 80
    protocol: TCP
    targetPort: 80
  type: ExternalName
  1. We also create fairly large Ingresses that utilize the ExternalName services
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: private
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  generation: 1
  name: ssc-1015z-15a2d2-stellar-core-ingress
  namespace: stellar-supercluster
spec:
  rules:
  - host: ssc-1015z-15a2d2.stellar-supercluster.example.com
    http:
      paths:
      - backend:
          service:
            name: ssc-1015z-15a2d2-sts-complete1-0
            port:
              number: 11626
        path: /ssc-1015z-15a2d2-sts-complete1-0/core(/|$)(.*)
        pathType: Prefix
      - backend:
          service:
            name: ssc-1015z-15a2d2-sts-complete2-0
            port:
              number: 11626
        path: /ssc-1015z-15a2d2-sts-complete2-0/core(/|$)(.*)
        pathType: Prefix
  1. When nginx ingress controlles sets things up it needs to resolve all ExternalNames
  2. When pods are not ready the service endpoints return NXDOMAIN
  3. Above causes each nginx controller, of which there are many, to flood coredns with requests.

What would you like to see?

I found a way to significantly simplify the whole setup. What we need is:

  1. Create one Service that catches all pods. Let's call it foobar
  2. Create small "proxy" nginx instance. This instance will use above service and proxy traffic to pods. Example config:
apiVersion: v1
kind: ConfigMap
metadata:
  name: proxy
  namespace: stellar-supercluster
data:
  default.conf: |
    server {
      listen 80 default_server;
      server_name _;
      resolver 10.96.0.10 ipv6=off;
      location ~ ^/(.+)/core$ {
        proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:11626/;
      }
      location ~ ^/(.+)/core/(.*)$ {
        proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:11626/$2;
      }

      location ~ ^/(.+)/history$ {
        proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:80/;
      }
      location ~ ^/(.+)/history/(.*)$ {
        proxy_pass http://$1.foobar.stellar-supercluster.svc.cluster.local:80/$2;
      }
    }
  1. Expose above proxy nginx using Ingress

What alternatives are there?

It may be possible to throw resources at the problem but due to the way DNS traffic is amplified it won't take us very far.

@sisuresh sisuresh added core-team Issue can be worked on by the core team ops-team Issue can be worked on by the ops team labels Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-team Issue can be worked on by the core team ops-team Issue can be worked on by the ops team
Projects
None yet
Development

No branches or pull requests

2 participants