Cluster Recreate Causes Kustomize Module To Fail #156

Spazzy757 · 2021-01-13T09:07:32Z

Problem

We are recreating our cluster to enable the Private Node Pools, the issue seems to be that because we are recreating the cluster, the Kustomize Provider is trying to communicate with the Kubernetes Cluster on the default Localhost

Logs

Error: ResourceDiff: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/gke_zero/common/cluster_services/main.tf line 16, in resource "kustomization_resource" "current":
  16: resource "kustomization_resource" "current" {


Error: Process completed with exit code 1.

Steps To Reproduce

Create a cluster with the setting:

enable_private_nodes = false

Then Once created change the value:

enable_private_nodes = true

and run on that TF workspace:

terraform plan

Workaround

Currently there is a workaround by using:

terraform apply --target=<cluster module>

This will update the cluster which should then fix the problem

The text was updated successfully, but these errors were encountered:

pst · 2021-01-18T13:45:31Z

This needs more investigation, but I've also seen this myself. The module receives the credentials the cluster resources output as an input. My preliminary investigation of this issue suggests that somehow when creating a plan to create cluster and cluster services, Terraform has the dependency graph correct. Also on destroy, the order seems correct, K8s resources first, then the cluster. But if the cluster gets destroyed and recreated, the graph does not first destroy the K8s resources, then destroy the cluster, then recreate the cluster and finally recreate the resources. That means the resources stay in the state but there are no cluster credentials to refresh them during plan.

pst · 2021-01-26T16:55:30Z

To make it easier to understand, I created a simple config to reproduce the issue. https://github.com/pst/debugrecreateplan

The example repo shows the behaviour with both the official kubernetes provider as well as my kustomize provider on top of a KinD cluster. So it's also not Google provider specific.

And so far it seems to support my theory. Create and destroy plans correctly handle resources and clusters. But destroy and re-create plans do not handle the K8s resources on the cluster at all.

Create plan

[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.kustomization.current: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kind_cluster.current will be created
  + resource "kind_cluster" "current" {
      + client_certificate     = (known after apply)
      + client_key             = (known after apply)
      + cluster_ca_certificate = (known after apply)
      + endpoint               = (known after apply)
      + id                     = (known after apply)
      + kubeconfig             = (known after apply)
      + kubeconfig_path        = (known after apply)
      + name                   = "debug-kind-kustomize"
      + node_image             = (known after apply)
      + wait_for_ready         = false

      + kind_config {
          + api_version = "kind.x-k8s.io/v1alpha4"
          + kind        = "Cluster"

          + node {
              + role = "control-plane"
            }
          + node {
              + role = "worker"
            }
        }
    }

  # kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be created
  + resource "kustomization_resource" "current" {
      + id       = (known after apply)
      + manifest = jsonencode(
            {
              + apiVersion = "v1"
              + kind       = "Namespace"
              + metadata   = {
                  + creationTimestamp = null
                  + name              = "debug"
                }
              + spec       = {}
              + status     = {}
            }
        )
    }

Plan: 2 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Destroy plan

[pst@pst-ryzen5 kind-kustomize]$ terraform plan --destroy
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # kind_cluster.current will be destroyed
  - resource "kind_cluster" "current" {
      - client_certificate     = <<~EOT
            ...
        EOT -> null
      - client_key             = <<~EOT
            ...
        EOT -> null
      - cluster_ca_certificate = <<~EOT
            ...
        EOT -> null
      - endpoint               = "https://127.0.0.1:44033" -> null
      - id                     = "debug-kind-kustomize-" -> null
      - kubeconfig             = <<~EOT
            ...
        EOT -> null
      - kubeconfig_path        = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> null
      - name                   = "debug-kind-kustomize" -> null
      - wait_for_ready         = false -> null

      - kind_config {
          - api_version               = "kind.x-k8s.io/v1alpha4" -> null
          - containerd_config_patches = [] -> null
          - kind                      = "Cluster" -> null

          - node {
              - kubeadm_config_patches = [] -> null
              - role                   = "control-plane" -> null
            }
          - node {
              - kubeadm_config_patches = [] -> null
              - role                   = "worker" -> null
            }
        }
    }

  # kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be destroyed
  - resource "kustomization_resource" "current" {
      - id       = "094e469a-08f9-47e4-a9f3-a39ae8268a89" -> null
      - manifest = jsonencode(
            {
              - apiVersion = "v1"
              - kind       = "Namespace"
              - metadata   = {
                  - creationTimestamp = null
                  - name              = "debug"
                }
              - spec       = {}
              - status     = {}
            }
        ) -> null
    }

Plan: 0 to add, 0 to change, 2 to destroy.

------------------------------------------------------------------------

Destroy & recreate plan

Triggered by changing node_count in main.tf. Does not include the K8s namespace.

[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # kind_cluster.current must be replaced
-/+ resource "kind_cluster" "current" {
      ~ client_certificate     = <<~EOT
            ...
        EOT -> (known after apply)
      ~ client_key             = <<~EOT
            ...
        EOT -> (known after apply)
      ~ cluster_ca_certificate = <<~EOT
            ...
        EOT -> (known after apply)
      ~ endpoint               = "https://127.0.0.1:44033" -> (known after apply)
      ~ id                     = "debug-kind-kustomize-" -> (known after apply)
      ~ kubeconfig             = <<~EOT
            ...
        EOT -> (known after apply)
      ~ kubeconfig_path        = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> (known after apply)
        name                   = "debug-kind-kustomize"
      + node_image             = (known after apply)
        wait_for_ready         = false

      ~ kind_config {
            api_version               = "kind.x-k8s.io/v1alpha4"
          - containerd_config_patches = [] -> null
            kind                      = "Cluster"

          ~ node { # forces replacement
              - kubeadm_config_patches = [] -> null
                role                   = "control-plane"
            }
          ~ node { # forces replacement
              - kubeadm_config_patches = [] -> null
                role                   = "worker"
            }
          + node { # forces replacement
              + role = "worker" # forces replacement
            }
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

pst · 2021-01-26T17:25:35Z

Likely related upstream issue: hashicorp/terraform#22572

Spazzy757 · 2021-05-17T13:54:20Z

This seems to be fixed in the last couple of times I've used it, I'll do a test to make sure

pst · 2021-05-17T14:20:28Z

This is definitly still an issue and it's not Kubestack specific, but generally an issue with Terraform.

I hope moving away from the in-module manifests and towards the new native modules may make the issue less frequent. But even then, e.g. the auth-configmap for EKS in the module may still cause this.

The only real workaround is a --target to deploy the changes to the cluster individually. Which is a bummer because this breaks automation. However, recreating the cluster is a disruptive change and should be rare for most teams.

Spazzy757 · 2021-05-17T15:05:00Z

Ah right, I recreated a GKE cluster that had a couple of manifests and it didn't break, was hoping that meant it was fixed

pst added the bug Something isn't working label Jan 20, 2021

pst self-assigned this Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Recreate Causes Kustomize Module To Fail #156

Cluster Recreate Causes Kustomize Module To Fail #156

Spazzy757 commented Jan 13, 2021 •

edited

Loading

pst commented Jan 18, 2021

pst commented Jan 26, 2021 •

edited

Loading

pst commented Jan 26, 2021

Spazzy757 commented May 17, 2021

pst commented May 17, 2021

Spazzy757 commented May 17, 2021

Cluster Recreate Causes Kustomize Module To Fail #156

Cluster Recreate Causes Kustomize Module To Fail #156

Comments

Spazzy757 commented Jan 13, 2021 • edited Loading

Problem

Logs

Steps To Reproduce

Workaround

pst commented Jan 18, 2021

pst commented Jan 26, 2021 • edited Loading

pst commented Jan 26, 2021

Spazzy757 commented May 17, 2021

pst commented May 17, 2021

Spazzy757 commented May 17, 2021

Spazzy757 commented Jan 13, 2021 •

edited

Loading

pst commented Jan 26, 2021 •

edited

Loading