ASO Managed Clusters (AKS)

  • Feature status: alpha, not experimental, fully supported
  • Feature gate: MachinePool=true

New in CAPZ v1.15.0 is a new flavor of APIs that addresses the following limitations of the existing CAPZ APIs for advanced use cases for provisioning AKS clusters:

  • A limited set of Azure resource types can be represented.
  • A limited set of Azure resource topologies can be expressed. e.g. Only a single Virtual Network resource can be reconciled for each CAPZ-managed AKS cluster.
  • For each Azure resource type supported by CAPZ, CAPZ generally only uses a single Azure API version to define resources of that type.
  • For each Azure API version known by CAPZ, only a subset of fields defined in that version by the Azure API spec are exposed by the CAPZ API.

This new API defines new AzureASOManagedCluster, AzureASOManagedControlPlane, and AzureASOManagedMachinePool resources. An AzureASOManagedCluster might look like this:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureASOManagedCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  resources:
  - apiVersion: resources.azure.com/v1api20200601
    kind: ResourceGroup
    metadata:
      name: my-resource-group
    spec:
      location: eastus

See here for a full AKS example using all the new resources.

The main element of the new API is spec.resources in each new resource, which defines arbitrary, literal ASO resources inline to be managed by CAPZ. These inline ASO resource definitions take the place of almost all other configuration currently defined by CAPZ. e.g. Instead of a CAPZ-specific spec.location field on the existing AzureManagedControlPlane, the same value would be expected to be set on an ASO ManagedCluster resource defined in an AzureASOManagedControlPlane's spec.resources. This pattern allows users to define, in full, any ASO-supported version of a resource type in any of these new CAPZ resources.

The obvious tradeoff with this new style of API is that CAPZ resource definitions can become more verbose for basic use cases. To address this, CAPZ still offers flavor templates that use this API with all of the boilerplate predefined to serve as a starting point for customization.

The overall theme of this API is to leverage ASO as much as possible for representing Azure resources in the Kubernetes API, thereby making CAPZ the thinnest possible translation layer between ASO and Cluster API.

This experiment will help inform CAPZ whether this pattern may be a candidate for a potential v2 API. This functionality is enabled by default and can be disabled with the ASOAPI feature flag (set by the EXP_ASO_API environment variable). Please try it out and offer any feedback!

Disable Local Accounts

When local accounts are disabled, like for AKS Automatic clusters, the kubeconfig generated by AKS assumes clients have access to the kubelogin utility locally to authenticate with Entra. This is not the case for clients like the Cluster API controllers which need to access Nodes in the workload cluster. To allow those controllers access, CAPZ will augment the kubeconfig from AKS to remove the exec plugin and add a token which is an Entra ID access token that clients can handle natively by passing as an Authorization: Bearer ... token. CAPZ authenticates with Entra using the same ASO credentials used to create the ManagedCluster resource, which might be any of the options described in ASO's documentation and must be assigned the Azure Kubernetes Service RBAC Cluster Admin Role.

When defining the embedded ManagedCluster in an AzureASOManagedControlPlane, ASO will fail to retrieve adminCredentials when local accounts are disabled, so userCredentials must be specified instead. In order to leave room for CAPZ to manage the canonical ${CLUSTER_NAME}-kubeconfig secret well-known to Cluster API, another name must be specified for this Secret to avoid CAPZ and ASO overwriting each other:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureASOManagedControlPlane
metadata:
  name: ${CLUSTER_NAME}
spec:
  resources:
  - apiVersion: containerservice.azure.com/v1api20240901
    kind: ManagedCluster
    metadata:
      name: ${CLUSTER_NAME}
    spec:
      operatorSpec:
        secrets:
          userCredentials:
            name: ${CLUSTER_NAME}-user-kubeconfig # NOT ${CLUSTER_NAME}-kubeconfig
            key: value

Migrating existing Clusters to AzureASOManagedControlPlane

Existing CAPI Clusters using the AzureManagedControlPlane and associated APIs can be migrated to use the new AzureASOManagedControlPlane and its associated APIs. This process relies on CAPZ's ability to adopt existing clusters that may not have been created by CAPZ, which comes with some caveats that should be reviewed first.

To migrate one cluster to the ASO-based APIs:

  1. Pause the cluster by setting the Cluster's spec.paused to true.
    kubectl patch cluster <name> --type merge -p '{"spec": {"paused": true}}'
    
  2. Wait for the cluster to be paused by waiting for the absence of the clusterctl.cluster.x-k8s.io/block-move annotation on the AzureManagedControlPlane and its AzureManagedMachinePools. This should be fairly instantaneous.
  3. Disarm the old ASO resources to prevent them from deleting the underlying Azure resources during cleanup. The old ASO resources (ResourceGroup, ManagedCluster, ManagedClustersAgentPools) have owner references to old CAPZ resources and ASO finalizers. Without this step, deleting or accidentally garbage-collecting the old CAPZ resources would cascade to the ASO resources, causing ASO to delete the actual Azure resources. This must be done before creating new ASO resources or deleting any old resources.
    # For each old ASO resource (ManagedClustersAgentPools, ManagedCluster, ResourceGroup):
    kubectl patch <resource> --type merge -p '{"metadata": {"annotations": {"serviceoperator.azure.com/reconcile-policy": "skip"}, "finalizers": null}}'
    
  4. Create a new namespace to contain the new resources to avoid conflicting ASO definitions.
    kubectl create namespace <new-namespace>
    
  5. Copy the ASO credential secret used by the existing cluster into the new namespace. The secret name can be found in the serviceoperator.azure.com/credential-from annotation on the existing ASO ManagedCluster resource.
    kubectl get secret <aso-credential-secret> -o json | \
      jq '.metadata = {name: .metadata.name, namespace: "<new-namespace>"}' | \
      kubectl apply -f -
    
  6. Adopt the underlying AKS resources from the new namespace, which creates the new CAPI and CAPZ resources. This must be done in order:
    1. Create an ASO ResourceGroup resource in the new namespace pointing at the existing resource group. The adoption controller requires this resource to exist before it can process the ManagedCluster. Wait for it to become Ready.

      apiVersion: resources.azure.com/v1api20200601
      kind: ResourceGroup
      metadata:
        name: <name>
        namespace: <new-namespace>
        annotations:
          serviceoperator.azure.com/credential-from: <aso-credential-secret>
      spec:
        azureName: <azure-resource-group-name>
        location: <location>
      
    2. Create an ASO ManagedCluster resource in the new namespace with the sigs.k8s.io/cluster-api-provider-azure-adopt: "true" annotation. Its spec.owner.name must reference the ResourceGroup created above. Wait for the ManagedCluster to become Ready and for CAPZ to scaffold the Cluster, AzureASOManagedCluster, and AzureASOManagedControlPlane resources.

      Important: The spec must include agentPoolProfiles. The old CAPZ controller strips agentPoolProfiles from the ASO ManagedCluster spec (it manages pools separately via ManagedClustersAgentPool resources), so the existing ASO resource's .spec will not have them. Extract them from .status.agentPoolProfiles instead:

      kubectl get managedcluster.containerservice.azure.com/<name> -o json | jq '.status.agentPoolProfiles'
      

      Without agentPoolProfiles, ASO will PUT the spec to Azure without pool definitions, triggering an update cycle that prevents the AzureASOManagedControlPlane from becoming ready.

    3. Create ASO ManagedClustersAgentPool resources in the new namespace, one per node pool, each with the sigs.k8s.io/cluster-api-provider-azure-adopt: "true" annotation. Their spec.owner.name must reference the ManagedCluster. Wait for CAPZ to scaffold the MachinePools and AzureASOManagedMachinePools.

    4. Wait for the new Cluster to show AVAILABLE=True.

  7. Forcefully delete the old Cluster. This is more complicated than normal because CAPI controllers do not reconcile paused resources at all, even when they are deleted. The underlying Azure resources will not be affected because the old ASO resources were disarmed earlier.
    • Delete the cluster: kubectl delete cluster <name> --wait=false
    • Delete the cluster infrastructure object: kubectl delete azuremanagedcluster <name> --wait=false
    • Delete the cluster control plane object: kubectl delete azuremanagedcontrolplane <name> --wait=false
    • Delete the machine pools: kubectl delete machinepool <names...> --wait=false
    • Delete the machine pool infrastructure resources: kubectl delete azuremanagedmachinepool <names...> --wait=false
    • Remove finalizers from the machine pool infrastructure resources: kubectl patch azuremanagedmachinepool <names...> --type merge -p '{"metadata": {"finalizers": null}}'
    • Remove finalizers from the machine pools: kubectl patch machinepool <names...> --type merge -p '{"metadata": {"finalizers": null}}'
    • Remove finalizers from the cluster control plane object: kubectl patch azuremanagedcontrolplane <name> --type merge -p '{"metadata": {"finalizers": null}}'
    • Note: the cluster infrastructure object should not have any finalizers and should already be deleted
    • Remove finalizers from the cluster: kubectl patch cluster <name> --type merge -p '{"metadata": {"finalizers": null}}'
    • Verify the old ASO resources like ResourceGroup and ManagedCluster are deleted from Kubernetes. The actual Azure resources should still exist, now managed by the new ASO resources in the new namespace.

Migrating from v1alpha1 to v1beta1

With the introduction of v1beta1 for ASO Managed APIs in CAPZ, users should migrate their clusters and manifests from v1alpha1 to v1beta1. Note: v1alpha1 and v1beta1 are equivalent — this migration is straightforward and low risk.

Steps to Migrate

  1. Upgrade CAPZ using clusterctl upgrade

    The CRDs will be updated automatically as part of the upgrade process.

  2. Update API Versions in Manifests

    For each AzureASOManaged... resource, change the apiVersion from infrastructure.cluster.x-k8s.io/v1alpha1 to infrastructure.cluster.x-k8s.io/v1beta1. For example:

    ...
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureASOManagedCluster
    ...
    

    Make these changes for AzureASOManagedCluster(Template), AzureASOManagedControlPlane(Template) and AzureASOManagedMachinePool(Template) definitions.

  3. Update References in CAPI Objects

    Update any references in CAPI objects (such as a Cluster’s spec.infrastructureRef and spec.controlPlaneRef) to point to the new apiVersion:

    spec:
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AzureASOManagedCluster
        name: my-cluster
      controlPlaneRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AzureASOManagedControlPlane
        name: my-cluster
    

    Similarly, update any other references to the API version for those object kinds.

What to Expect After Migration

After completing the steps above, the following should be true:

  • All resources are healthy and visible. For an informative snapshot of your cluster and its resources, you can run:

    clusterctl describe cluster <your-cluster-name>
    
  • The resources are now using v1beta1 and reconciliation is working as expected.

  • The CRD storage version is set to v1beta1.