Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ASO #3723

Merged
merged 2 commits into from
Aug 29, 2023
Merged

Enable ASO #3723

merged 2 commits into from
Aug 29, 2023

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Jul 13, 2023

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR enables all of the currently-merged ASO functionality which was previously inaccessible to users. Specifically, this change will install ASO alongside CAPZ and use it to manage resource groups.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3527, fixes #3523, fixes #3532

Special notes for your reviewer:

This PR is a work-in-progress opened early to start getting test signal and accumulate changes from other contributors as necessary. Reviews are welcome but I plan to rebase/force-push this branch frequently before removing the [WIP] status.

  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Azure Service Operator is now installed alongside CAPZ and manages resource groups

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 13, 2023
azure/services/aso/tags.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 13, 2023

Codecov Report

Patch coverage: 31.74% and project coverage change: -0.05% ⚠️

Comparison is base (e812eae) 55.57% compared to head (ec469b3) 55.53%.
Report is 8 commits behind head on main.

❗ Current head ec469b3 differs from pull request most recent head 270f646. Consider uploading reports for the commit 270f646 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3723      +/-   ##
==========================================
- Coverage   55.57%   55.53%   -0.05%     
==========================================
  Files         190      190              
  Lines       19523    19547      +24     
==========================================
+ Hits        10850    10855       +5     
- Misses       8064     8080      +16     
- Partials      609      612       +3     
Files Changed Coverage Δ
controllers/azurecluster_controller.go 37.07% <ø> (ø)
controllers/azuremanagedcontrolplane_controller.go 40.66% <ø> (ø)
controllers/helpers.go 56.82% <12.50%> (-0.10%) ⬇️
azure/scope/managedcontrolplane.go 23.88% <14.28%> (+0.01%) ⬆️
controllers/azurecluster_reconciler.go 68.14% <35.71%> (-1.51%) ⬇️
azure/scope/cluster.go 56.48% <40.00%> (-0.33%) ⬇️
controllers/azuremanagedcontrolplane_reconciler.go 37.50% <40.00%> (-0.74%) ⬇️
azure/services/asogroups/groups.go 85.18% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 28, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 9, 2023

/test ?

@k8s-ci-robot
Copy link
Contributor

@nojnhuh: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-azure-build
  • /test pull-cluster-api-provider-azure-ci-entrypoint
  • /test pull-cluster-api-provider-azure-e2e
  • /test pull-cluster-api-provider-azure-e2e-aks
  • /test pull-cluster-api-provider-azure-test
  • /test pull-cluster-api-provider-azure-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-azure-apidiff
  • /test pull-cluster-api-provider-azure-apiversion-upgrade
  • /test pull-cluster-api-provider-azure-capi-e2e
  • /test pull-cluster-api-provider-azure-conformance
  • /test pull-cluster-api-provider-azure-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-e2e-optional
  • /test pull-cluster-api-provider-azure-e2e-workload-upgrade
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts
  • /test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-azure-apidiff
  • pull-cluster-api-provider-azure-build
  • pull-cluster-api-provider-azure-ci-entrypoint
  • pull-cluster-api-provider-azure-e2e
  • pull-cluster-api-provider-azure-e2e-aks
  • pull-cluster-api-provider-azure-test
  • pull-cluster-api-provider-azure-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 9, 2023

/test pull-cluster-api-provider-azure-e2e
^ this passed but took ~30m longer than usual. Seems like an etcd fluke, but running again to confirm.

/test pull-cluster-api-provider-azure-apiversion-upgrade
/test pull-cluster-api-provider-azure-capi-e2e
/test pull-cluster-api-provider-azure-e2e-optional

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 9, 2023

pull-cluster-api-provider-azure-e2e-optional failing because ASO auth is missing UserAssignedMSI (#3743) and WorkloadIdentity (#3732).

/test pull-cluster-api-provider-azure-apiversion-upgrade
^ VM failed provisioning on the old version before upgrade.

/test pull-cluster-api-provider-azure-e2e
^ Passed before, but at least this one didn't take any longer.

/test pull-cluster-api-provider-azure-capi-e2e
^ Looks like another VM provisioning failure.

@Jont828
Copy link
Contributor

Jont828 commented Aug 9, 2023

@nojnhuh Is this ready for review or would you like some more time to hack away at this?

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 9, 2023

@nojnhuh Is this ready for review or would you like some more time to hack away at this?

Feel free to leave any feedback. I'm not expecting to make any significant changes to this, but there's some orthogonal work that will land in other PRs that this PR depends on. So I'll continue to rebase/force-push this branch until I remove the [WIP].

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 10, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 14, 2023

Pardon me, just flake hunting 🔍

/test pull-cluster-api-provider-azure-apiversion-upgrade
/test pull-cluster-api-provider-azure-e2e
/test pull-cluster-api-provider-azure-e2e-optional
/test pull-cluster-api-provider-azure-capi-e2e

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 14, 2023

Workload Identity test should be passing now?
/test pull-cluster-api-provider-azure-e2e-optional

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 21, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 23, 2023

Just pushed a couple of commits to bring back the old groups service and wire up some logic to preserve the old behavior when a cluster is using UserAssignedMSI auth since that isn't yet supported by ASO (#3743).

/test pull-cluster-api-provider-azure-e2e-optional
^ Private cluster test should pass now

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 23, 2023
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm

/cc @matthchr

Tiltfile Outdated Show resolved Hide resolved
@@ -59,7 +59,7 @@ func (s *Service) Name() string {

// Reconcile idempotently creates or updates a resource group.
func (s *Service) Reconcile(ctx context.Context) error {
ctx, _, done := tele.StartSpanWithLogger(ctx, "groups.Service.Reconcile")
ctx, _, done := tele.StartSpanWithLogger(ctx, "asogroups.Service.Reconcile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't be possible yet since we're having to keep both packages side by side for now for UserAssigned identity but thoughts on removing the "aso" prefix everywhere in naming and going back to just "groups" as service name once we remove legacy groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the plan.

AZURE_CLIENT_ID: ${AZURE_CLIENT_ID_B64:=""}
AZURE_CLIENT_SECRET: ${AZURE_CLIENT_SECRET_B64:=""}
# Per-resource Secrets will be created based on a Cluster's AzureClusterIdentity.
AZURE_SUBSCRIPTION_ID: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh is that behavior we expect? if not is there an issue tracking this in ASO to make them optional?

@@ -0,0 +1,55 @@
# Azure Service Operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding docs! 📖

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: GitHub didn't allow me to request PR reviews from the following users: matthchr.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Overall lgtm

/cc @matthchr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 24, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 24, 2023

/test pull-cluster-api-provider-azure-e2e-optional
/test pull-cluster-api-provider-azure-capi-e2e
/test pull-cluster-api-provider-azure-apiversion-upgrade

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 24, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7674a93b9bb98dcef7b90350d647a9c6b5d2e6b1

Copy link

@matthchr matthchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one small question but overall this looks reasonable to me.


### Installation

Beginning with CAPZ v1.11.0, ASO's control plane will be installed automatically by `clusterctl` in the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a permanent behavior or at some future date BYO ASO might be supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't yet discussed any plans to support BYO ASO. I think the main roadblock to that is that clusterctl expects certain labels applied to the CRDs for things like clusterctl move to work. In theory, if users add those themselves to the ASO CRDs, then things should work, but that would require a fair bit of testing that we haven't done yet.

And I think we would need clusterctl init to support some conditional logic that could enable/disable the ASO install alongside CAPZ to support existing users relying on ASO being installed that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue to discuss BYO ASO?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily saying you need one until such time as a user asks about it - just something worth thinking about ahead of time as it is possible it will come up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #3900 to track

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @nojnhuh! Just a few minor questions from my end.

{
Scope: azure.ManagedClusterID(s.SubscriptionID(), s.ResourceGroup(), s.ManagedClusterSpec().ResourceName()),
Tags: s.AdditionalTags(),
Annotation: azure.ManagedClusterTagsLastAppliedAnnotation,
},
}
if s.UseLegacyGroups {
specs = append(specs, azure.TagsSpec{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using legacy groups, would we need both Tags Specs? I may be misunderstanding here but it looks like we will have two copies of the additional tags here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tags spec is for the resource group and one is for the managed cluster. With ASO, tags for a resource are reconciled alongside the rest of the resource instead of with the separate tags service which these serve as input to. So for an ASO resource, we don't need to define a tags spec.

Namespace: namespace,
DeletionTimestamp: &metav1.Time{Time: time.Now()},
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be out of scope for this PR but it could be worth adding some test cases for UseLegacyGroups: true to cover all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try adding some but it turned out to be super messy and I couldn't justify fighting through that since ideally we'll drop that flag entirely in the next week or so. I can come back to these if we think it will stick around much longer than that.

@willie-yao
Copy link
Contributor

/lgtm

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 28, 2023

I squashed most of this but left two commits so we can revert using ASO for resource groups independently from installing ASO altogether if need be.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2023
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Aug 28, 2023

Quota issue should be fixed.
/retest

@k8s-ci-robot k8s-ci-robot merged commit 34aa125 into kubernetes-sigs:main Aug 29, 2023
12 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.11 milestone Aug 29, 2023
@nojnhuh nojnhuh deleted the aso-wired branch August 29, 2023 00:33
@nojnhuh nojnhuh mentioned this pull request Sep 6, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Collect ASO resources/logs in e2e artifacts Enable ASO Expose ASO metrics in Tilt's Prometheus
6 participants