-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI improvement: Remove explicit steps that are already addressed by org policies #1243
Comments
Hi @apeabody , looks like you made a recent change on the core-project-factory module terraform-google-modules/terraform-google-project-factory@cfd7f3f that might obviate the fix I'm working on. I started working on a new PR to override the default behavior of the core-project-factory (replace |
Hi @eeaton! - Likely. We often see Note: Here is the PR for the updated version: #1221 |
Good news, thanks. I'm seeing quite a few of those 409 errors on terraform retry, so I'll prioritize getting 1221 merged and see if that helps reduce the errors. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days |
TL;DR
Investigating the cause of flaky CI errors, I'm seeing a high rate of the following issues that are set in the project factory module but can be better addressed through organization policies:
However, it is not necessary to delete a default VPC if it is blocked by org policy. Provider docs state it is recommended to use the organisational policy constraint instead of setting auto_create_network to false, as is done in the project factory.
The default behavior of the project factory is a bit nonintuitive. Because the GCP platform creates a default network by default, the project factory module overrides this with
auto_create_network = false
. This behavior enables the Compute API, queries it for the auto-created network, and then attempts to delete the default VPC. However, it can introduce issues with eventual consistency. Conversely, whenauto_create_network = true
, the project factory does not attempt to query the Compute API. If the org policy to prevent the default network is enforced, and auto_created_network = true, we get the desired (if non-intuitive) behavior to not create a default VPC and not try to immediately query Compute API at project creation.Note that the provider docs also state this tf resource is a best-effort basis, as no API formally describes the default service account resource and it is only intended for use cases that can't use the org policy.
The foundation blueprint already sets these org policies, so I expect we can remove some of these flaky errors about eventual consistency by setting the org policies first and avoiding these steps.
Terraform Resources
Projects that explicitly try to deprivilege the service account. After the org policy is enforced, this is no longer necessary. However, the org policy is created in stage 1-org and is eventually consistent, and some projects are also created in 1-org, so it's tricky to guarantee that the policy is actually enforced before projects are created.
terraform-google-project-factory module by default has auto_create_network set to false. In comparison, the google_project resource from Google provider defaults this to true. This means the project factory always attempts to enable the Compute Engine API, create the default network, then immediately delete it. This step is not necessary if the org policy is already in place.
Detailed design
The goal of removing the default VPC and deprivileging the default service account is already addressed by Org policies
compute.skipDefaultNetworkCreation"
andiam.automaticIamGrantsForDefaultServiceAccounts
in 1-org step. After these policies are enforced, there is no need to explicitly delete the default VPC or disable the default service account; conversely, attempting to do these actions contributes to flaky failures when trying to reference APIs or resources whose state is eventually consistent.Fixes:
Additional information
Sample error logs for #1
[...]
Step #7 - "converge-org": Error: Received unexpected error:
Step #7 - "converge-org": FatalError{Underlying: error while running command: exit status 1;
Step #7 - "converge-org": Error: error creating project tyj-net-dns-oo3v (tyj-net-dns): googleapi: Error 409: Requested entity already exists, alreadyExists. If you received a 403 error, make sure you have the
roles/resourcemanager.projectCreator
permissionStep #7 - "converge-org":
Step #7 - "converge-org": with module.dns_hub.module.project-factory.google_project.main,
Step #7 - "converge-org": on .terraform/modules/dns_hub/modules/core_project_factory/main.tf line 73, in resource "google_project" "main":
Step #7 - "converge-org": 73: resource "google_project" "main" {
Step #7 - "converge-org":
Step #7 - "converge-org":
Step #7 - "converge-org": Error: Error creating service account: googleapi: Error 409: Service account project-service-account already exists within project projects/tyj-
Sample error logs for 2:
The text was updated successfully, but these errors were encountered: