Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubectl_path_documents] for_each causes map cannot be determined until apply error #153

Open
berglh opened this issue Jul 30, 2024 · 6 comments

Comments

@berglh
Copy link

berglh commented Jul 30, 2024

Preamble

This is a continuing issue for using for_each in the kubectl_manifest resource when using the kubectl_path_documents data source. This looks to never have been resolved in any version of the terraform-provider-kubectl module. I'm wondering if there is a more deterministic way to ensure whatever it thinks is a not determined about the manifest map can be determined (like validating during plan), the terraform issue I reference at the bottom of my description states:

Providers built using the modern Provider Framework don't run into that particular malfunction [for_each error], but it still isn't really clear what a provider ought to do when a crucial argument is unknown and so e.g. the AWS Cloud Control provider -- a flagship use of the new framework -- reacts to unknown provider arguments by returning an error, causing a similar effect as we see for count and for_each above.

Does this mean the "modern" Provider Framework should be adopted to avoid this issue?
Can the resource be improved to resolve during the plan phase?

Issue

I believe the issue is due to newer versions of terraform not resolving the map of manifests/documents during the plan phase. I am using Terraform v1.9.2. I am trying to deploy a karpenter EC2NodeClass template from a sub-directory to an EKS cluster running v1.29 of Kubernetes. We deploy the terraform project using GitLab CI, and it fails if terraform plan fails.

My code runs inside of a sub-module for my terraform project, not at the top-level main.tf, but I wouldn't imagine this should impact things.

|- modules
|--- nodes
|----- class
|------- al2023_test.yaml
|----- main.tf
|----- variables.tf

main.tf

// Amazon Linux 2023 node classes
data "kubectl_path_documents" "al2023_node_classes" {
  pattern = "./class/al2023*.yaml"
  vars = {
    karpenter_node_role = var.karpenter.node_role_name
    cluster_name        = var.cluster.name
    authorized_keys     = local.authorized_keys_sh
  }
}

resource "kubectl_manifest" "al2023_node_classes" {
  for_each   = data.kubectl_path_documents.al2023_node_classes.manifests
  yaml_body  = each.value
}

I paint in some variables sourced from other modules, however, this error also occurs when no variables are being applied, I have a karpenter NodePool manifest file that uses the same structure as the documentation and it also suffers from the same issue.

// Node pools
data "kubectl_path_documents" "node_pools" {
  pattern = "./pool/*.yaml"
}

resource "kubectl_manifest" "node_pools" {
  for_each   = data.kubectl_path_documents.node_pools.manifests
  yaml_body  = each.value
}

When doing a terraform plan, I get the following error:

│ Error: Invalid for_each argument
│ 
│   on modules/node/main.tf line 44, in resource "kubectl_manifest" "al2023_node_classes":
│   44:   for_each   = data.kubectl_path_documents.al2023_node_classes.manifests
│     ├────────────────
│     │ data.kubectl_path_documents.al2023_node_classes.manifests is a map of string, known only after apply
│ 
│ The "for_each" map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.
│ 
│ When working with unknown values in for_each, it's better to define the map keys statically in your configuration and place apply-time results only in the map values.
│ 
│ Alternatively, you could use the -target planning option to first apply only the resources that the for_each value depends on, and then apply a second time to fully converge.

If I try to use the count method with document attribute instead, I get a similar error:

╷
│ Error: Invalid count argument
│ 
│   on modules/node/main.tf line 44, in resource "kubectl_manifest" "al2023_node_classes":
│   44:   count     = length(data.kubectl_path_documents.al2023_node_classes.documents)
│ 
│ The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first apply only the resources that the count depends on.
╵
╷
│ Error: Invalid count argument
│ 
│   on modules/node/main.tf line 54, in resource "kubectl_manifest" "node_pools":
│   54:   count     = length(data.kubectl_path_documents.node_pools.documents)
│ 
│ The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first apply only the resources that the count depends on.

Related Issues

There is a long history of this issue, and it seems to be related to the last issue (two links) in this list.

Work-around

The above linked comment does work-around the issue, but needless to say, it's an ongoing issue for me regardless of applying plain manifests from a sub-directory or using variable interpolation to the manfiest file. I can get even interpolate the values using the templatefile function, so this isn't a blocker, but the documentation as provided for this module doesn't work with my current version of terraform.

@alekc
Copy link
Owner

alekc commented Jul 30, 2024

Can you maybe try a bit different pattern? I am doing similar but

  1. keep only one manifest in one file
  2. use either file or templatefile for rendering
locals {
  files = { for fileName in fileset(path.module, "static/**/[a-z]*.yaml") : fileName => templatefile("${path.module}/${fileName}", {}) }
}

resource "kubectl_manifest" "example" {
  for_each = local.files
  yaml_body = each.value
}

Output:

➜ terraform apply              

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kubectl_manifest.example["static/secret.yaml"] will be created
  + resource "kubectl_manifest" "example" {
      + api_version             = "v1"
      + apply_only              = false
      + field_manager           = "kubectl"
      + force_conflicts         = false
      + force_new               = false
      + id                      = (known after apply)
      + kind                    = "Secret"
      + live_manifest_incluster = (sensitive value)
      + live_uid                = (known after apply)
      + name                    = "secret-basic-auth"
      + namespace               = "default"
      + server_side_apply       = false
      + uid                     = (known after apply)
      + validate_schema         = true
      + wait_for_rollout        = true
      + yaml_body               = (sensitive value)
      + yaml_body_parsed        = <<-EOT
            apiVersion: v1
            data: (sensitive value)
            kind: Secret
            metadata:
              name: secret-basic-auth
              namespace: default
            stringData: (sensitive value)
            type: Opaque
        EOT
      + yaml_incluster          = (sensitive value)
    }

  # kubectl_manifest.example["static/secret2.yaml"] will be created
  + resource "kubectl_manifest" "example" {
      + api_version             = "v1"
      + apply_only              = false
      + field_manager           = "kubectl"
      + force_conflicts         = false
      + force_new               = false
      + id                      = (known after apply)
      + kind                    = "Secret"
      + live_manifest_incluster = (sensitive value)
      + live_uid                = (known after apply)
      + name                    = "secret-basic-auth2"
      + namespace               = "default"
      + server_side_apply       = false
      + uid                     = (known after apply)
      + validate_schema         = true
      + wait_for_rollout        = true
      + yaml_body               = (sensitive value)
      + yaml_body_parsed        = <<-EOT
            apiVersion: v1
            data: (sensitive value)
            kind: Secret
            metadata:
              name: secret-basic-auth2
              namespace: default
            stringData: (sensitive value)
            type: Opaque
        EOT
      + yaml_incluster          = (sensitive value)
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

kubectl_manifest.example["static/secret.yaml"]: Creating...
kubectl_manifest.example["static/secret2.yaml"]: Creating...
kubectl_manifest.example["static/secret2.yaml"]: Creation complete after 1s [id=/api/v1/namespaces/default/secrets/secret-basic-auth2]
kubectl_manifest.example["static/secret.yaml"]: Creation complete after 1s [id=/api/v1/namespaces/default/secrets/secret-basic-auth]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.


➜ 

➜ terraform version
Terraform v1.9.3
on darwin_arm64
+ provider registry.terraform.io/alekc/kubectl v2.0.4

@erezhazan1
Copy link

I'm having the same issue, where when I apply everything works, but later on, suddenly I get the same error

@berglh
Copy link
Author

berglh commented Jul 30, 2024

@alekc - Thanks for the prompt reply!

I only have one Kubernetes resource per YAML file. I also only currently have a single file, I am just about to create additional node pools and classes. To be clear, this is only occurring for me using terraform plan, I haven't tried just apply without plan as I can't skip plan in GitLab CI.

It's possible that the first apply did work as per @erezhazan1 and then subsequent plans didn't work. This terraform project configures a lot, including the EKS cluster and all the VPC/IAM/KMS related things, and as I have been iterating, I've slowly been adding more resources to the project and fixing issues with related AWS services.

Regarding your suggestion, I am effectively do what you have written using the fileset and templatefile functions as I had linked in the workaround section of my OP. This is working, it just doesn't seem as clean HCL as using kubectl_path_documents if it were working for me.

Edit: I don't know if this is also a difference, but I am using S3 as the terraform backend with Dynamo DB locking. I would struggle to see this being an issue though as the state should be the same regardless of the backend.

// Amazon Linux 2023 node classes
resource "kubectl_manifest" "al2023_node_classes" {
  for_each = fileset("${abspath(path.module)}/class", "al2023*.yaml")
  yaml_body = templatefile("${abspath(path.module)}/class/${each.value}", {
    karpenter_node_role = var.karpenter.node_role_name
    cluster_name        = var.cluster.name
    authorized_keys     = local.authorized_keys_sh
  })
}

// Node pools
resource "kubectl_manifest" "node_pools" {
  for_each  = fileset("${abspath(path.module)}/pool", "*.yaml")
  yaml_body = file("${abspath(path.module)}/pool/${each.value}")
}

🙏

@alekc
Copy link
Owner

alekc commented Jul 31, 2024

Not sure about the kubectl_path_documents (its pertty much legacy, and imho not very useful). I would suspect that fileset is processed before the plan, while the kubectl_path_documents is data based, so until you run a plan, you do not know how many entries is there, which might trigger the issue.

Thats the most reasonable explanation coming to my mind.

@berglh
Copy link
Author

berglh commented Aug 1, 2024

Hi @alekc,

Were you able to replicate the plan issue after apply?

Initially, I find this strange, many other data sources, like terraform remote state or cloudformation outputs also have a lazy evaluation similar to this situation - where the values of the referenced attributes are not determined until apply, and they are able to pass planning, even though some other resources from a different module references them.

My only guess is that the trouble is related to the fact the data source generates a map of an unknown length and unknown keys which now throws the error. I'm not sure if the data source can be updated to resolve this, I guess it is difficult when it could be the case that YAML is only generated through some other step during apply to the file system, therefore you'd want it to be dynamic and unresolved at the planning stage.

The main benefit of kubectl_path_documents is that it's cleaner and easier to read when interpolating variables. fileset and templatefile seem a little hard to grok what is happening for contributors new to the project. I can expand on my comments to explain what's happening, but at a minimum, perhaps the kubectl documentation should be adjusted to indicate that this data provider isn't reliable anymore? It seems like a nice solution for YAML templates where you want to dynamically change values based upon variables, which could change with the environment/workspace of the terraform project.

I did find another unrelated error result when I changed the resource name inside the Kubernetes manifest, but the filename remained the same - in this case it failed to apply but passed planning using the workaround in my last comment.

Looking at the kubectl_manifest dcos I can see that force_new is required for things to update correctly using delete/create. I would expect the error message to reflect this requirement rather than stating that there is a bug in provider. I can open a new issue if you want to track that, just let me know if you prefer to leave it as is.

╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for
│ module.node.kubectl_manifest.al2023_node_classes["al2023_test.yaml"] to
│ include new values learned so far during apply, provider
│ "registry.terraform.io/alekc/kubectl" produced an invalid new value for
│ .name: was cty.StringVal("test"), but now cty.StringVal("al2023").
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

@yongzhang
Copy link

This is a known issue of terraform, see my origin issue here. There's no solution so far, but I noticed that terraform v1.10 has a new feature deferred actions can probably fix this, see last part of EXPERIMENTS here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants