Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway API / Cert-manager Crds #78

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

luminosita
Copy link

Minor update to CRDs

cert-manager: depricated flag "installCRDs"
gateway API CRDs: gatewayclass was not accepted after CIlium deployment during Talos bootstrap. This fixed that issue. I guess you opted for Cilium reinstall once Talos bootstrap finishes.

All the best,

vehagn and others added 4 commits September 8, 2024 19:22
Use Authelia in an attempt to replace Keycloak. Kanidm is another alternative we're going to try later.
@Mdleal
Copy link

Mdleal commented Sep 12, 2024

I spent 2 days trying to figure this out and as this is my first time using kube and terra it was a great learning experience.

@Mdleal
Copy link

Mdleal commented Sep 12, 2024

I was not able to get this working the commit suggested but was able to load the CRDs with the following.

extraManifests:
- https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/standard-install.yaml

@vehagn
Copy link
Owner

vehagn commented Sep 13, 2024

Thank you for the corrections @luminositas!

Do you mind rebasing your commits to resolve the conflicts? I can do this if it's OK with you. I'd also appreciate it if you could prefix your commit messages with the appropriate conventional commits prefix, e.g. fix:.

@Mdleal I'm glad you were able to figure it out. I'll update the article with a mention of the extraManifests for the Gateway API CRDs in this fix. I do "reinstall" Cilium after the bootstrap since it's being synced by Argo CD. This process is (poorly) documented in k8s/README.md.

@mitchross
Copy link

would this PR / Issue explain why all my apps have

"upstream connect error or disconnect/reset before headers. reset reason: connection timeout"

when I try to reach them? The best data ive found is that a cert is bad...

@vehagn
Copy link
Owner

vehagn commented Sep 13, 2024

@mitchross I don't think so.

Are you trying to use the Gateway API? I think I've seen that error message from there before, though it sounds very generic.
Cilium 1.16 changed the Envoy containers to standalone, this means that if you want to use privileged ports, e.g. 80 and 443, you need to give Envoy the correct privileges (docs), you can see my config for this here and here.

gatewayAPI:
  enabled: true
## Uncomment to run on the host network, e.g. when LoadBalancer Services are not available
#  hostNetwork:
#    enabled: true
envoy:
  securityContext:
    capabilities:
      keepCapNetBindService: true
      envoy:
        - NET_ADMIN
        - PERFMON
        - BPF
  ## Enable SYS_ADMIN capability instead of PERFMON and BPF if running on Linux Kernel < 5.8 and Cri-O < 1.22.0 or containerd < 1.5.0
  #       - SYS_ADMIN
  ## Enable NET_BIND_SERVICE capability to use port numbers < 1024, e.g. 80 or 443
  #       - NET_BIND_SERVICE

Edit: I see I've not added the NET_BIND_SERVICE capability in my config, but I'm still somehow able to use port 443.
Maybe it has something to do with BPF + keepCapNetBindService: true?

This should definitely work though:

gatewayAPI:
  enabled: true
envoy:
  securityContext:
    capabilities:
      keepCapNetBindService: true
      envoy: [ NET_ADMIN, SYS_ADMIN, NET_BIND_SERVICE ]

@mitchross
Copy link

@mitchross I don't think so.

Are you trying to use the Gateway API? I think I've seen that error message from there before, though it sounds very generic. Cilium 1.16 changed the Envoy containers to standalone, this means that if you want to use privileged ports, e.g. 80 and 443, you need to give Envoy the correct privileges (docs), you can see my config for this here and here.

gatewayAPI:
  enabled: true
## Uncomment to run on the host network, e.g. when LoadBalancer Services are not available
#  hostNetwork:
#    enabled: true
envoy:
  securityContext:
    capabilities:
      keepCapNetBindService: true
      envoy:
        - NET_ADMIN
        - PERFMON
        - BPF
  ## Enable SYS_ADMIN capability instead of PERFMON and BPF if running on Linux Kernel < 5.8 and Cri-O < 1.22.0 or containerd < 1.5.0
  #       - SYS_ADMIN
  ## Enable NET_BIND_SERVICE capability to use port numbers < 1024, e.g. 80 or 443
  #       - NET_BIND_SERVICE

Edit: I see I've not added the NET_BIND_SERVICE capability in my config, but I'm still somehow able to use port 443. Maybe it has something to do with BPF + keepCapNetBindService: true?

This should definitely work though:

gatewayAPI:
  enabled: true
envoy:
  securityContext:
    capabilities:
      keepCapNetBindService: true
      envoy: [ NET_ADMIN, SYS_ADMIN, NET_BIND_SERVICE ]

Yea I basically have a 1:1 clone of your repo ( minus terraform, automation, but the k8s )

and ive had it all working multiple times.. then one day I got the err, connection reset message and I have been chasing it for weeks. When I redeployed/deleted/messed with cert manager I got it working for a bit...

The closest issues i found

cilium/cilium#24146
cert-manager/cert-manager#6799

Im going to nuke my cluster and start over... see if its a order of operations issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants