Kubernetes Cluster API Provider Azure

Go Report Card


Kubernetes-native declarative infrastructure for Azure.

What is the Cluster API Provider Azure

The Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management.

The API itself is shared across multiple cloud providers allowing for true Azure hybrid deployments of Kubernetes.

Quick Start

Check out the Cluster API Quick Start to create your first Kubernetes cluster on Azure using Cluster API.

Flavors

See the flavors documentation to know which cluster templates are provided by CAPZ.


Support Policy

This provider’s versions are compatible with the following versions of Cluster API:

Cluster API v1alpha2 (v0.2.x)Cluster API v1alpha3 (v0.3.x)Cluster API v1alpha4 (v0.4.x)
Azure Provider v0.3.x
Azure Provider v0.4.x
Azure Provider v0.5.x

This provider’s versions are able to install and manage the following versions of Kubernetes:

Azure Provider v0.3.xAzure Provider v0.4.xAzure Provider v0.5.x
Kubernetes 1.15
Kubernetes 1.16
Kubernetes 1.17
Kubernetes 1.18
Kubernetes 1.19
Kubernetes 1.20
Kubernetes 1.21
Kubernetes 1.22

Each version of Cluster API for Azure will attempt to support at least two Kubernetes versions e.g., Cluster API for Azure v0.1 may support Kubernetes 1.13 and Kubernetes 1.14.

NOTE: As the versioning for this project is tied to the versioning of Cluster API, future modifications to this policy may be made to more closely align with other providers in the Cluster API ecosystem.


Documentation

Please see our Book for in-depth user documentation.

Additional docs can be found in the /docs directory, and the index is here.

Getting involved and contributing

Are you interested in contributing to cluster-api-provider-azure? We, the maintainers and community, would love your suggestions, contributions, and help! Also, the maintainers can be contacted at any time to learn more about how to get involved.

To set up your environment checkout the development guide.

In the interest of getting more new people involved, we tag issues with good first issue. These are typically issues that have smaller scope but are good ways to start to get acquainted with the codebase.

We also encourage ALL active community participants to act as if they are maintainers, even if you don’t have “official” write permissions. This is a community effort, we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power! Don’t assume that the only people who can get things done around here are the “maintainers”.

We also would love to add more “official” maintainers, so show us what you can do!

This repository uses the Kubernetes bots. See a full list of the commands here.

Office hours

The community holds office hours every two weeks, with sessions open to all users and developers.

Office hours are hosted on a zoom video chat every other Thursday at 08:00 (PT) / 11:00 (ET) / 16:00 (UTC), and are published on the Kubernetes community meetings calendar.

Other ways to communicate with the contributors

Please check in with us in the #cluster-api-azure channel on Slack.

Github issues

Bugs

If you think you have found a bug please follow the instructions below.

  • Please spend a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
  • Get the logs from the cluster controllers. Please paste this into your issue.
  • Open a bug report.
  • Remember users might be searching for your issue in the future, so please give it a meaningful title to helps others.
  • Feel free to reach out to the cluster-api community on kubernetes slack.

Tracking new features

We also use the issue tracker to track features. If you have an idea for a feature, or think you can help Cluster API Provider Azure become even more awesome, then follow the steps below.

  • Open a feature request.
  • Remember users might be searching for your issue in the future, so please give it a meaningful title to helps others.
  • Clearly define the use case, using concrete examples. EG: I type this and cluster-api-provider-azure does that.
  • Some of our larger features will require some design. If you would like to include a technical design for your feature please include it in the issue.
  • After the new feature is well understood, and the design agreed upon we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) pull request, and happy coding.

Cluster API Azure Roadmap

This roadmap is a constant work in progress, subject to frequent revision. Dates are approximations. Features are listed in no particular order.

v0.5 (v1alpha4) ~ Q1 2021

AreaDescriptionIssue/Proposal
OSWindows worker nodes#153
IdentityMulti-tenancy within one manager instance#586
UXBootstrap failure detection#603
UXAdd tracing and metrics#311

v1beta1/v1 ~ TBD

AreaDescriptionIssue/Proposal
NetworkAllow multiple subnets of role “node”#664
NetworkAzure Bastion hosts#165
IdentityAAD Support#481

Backlog

Items within this category have been identified as potential candidates for the project and can be moved up into a milestone if there is enough interest.

AreaDescriptionIssue/Proposal
NetworkPeering of Cluster VNet to Existing VNet#532
OSFlatcar Support#629
ComputeSGX-enabled VMs#488
ComputeAzure Dedicated hosts#675
AKSIntegrate AzureMachine with AzureManagedControlPlane#826
Cloud ProviderUse Out of Tree cloud-controller-manager and Storage Drivers#715

Topics

This section contains information about enable and configure various Azure features with Cluster API Provider Azure.

Getting started with cluster-api-provider-azure

Prerequisites

Requirements

Setting up your Azure environment

An Azure Service Principal is needed for deploying Azure resources. The below instructions utilize environment-based authentication.

  1. Login with the Azure CLI.
az login
  1. List your Azure subscriptions.
az account list -o table
  1. If more than one account is present, select the account that you want to use.
az account set -s <SubscriptionId>
  1. Save your Subscription ID in an environment variable.
export AZURE_SUBSCRIPTION_ID="<SubscriptionId>"
  1. Create an Azure Service Principal by running the following command or skip this step and use a previously created Azure Service Principal. NOTE: the “owner” role is required to be able to create role assignments for system-assigned managed identity.
az ad sp create-for-rbac --role contributor
  1. Save the output from the above command in environment variables.
export AZURE_TENANT_ID="<Tenant>"
export AZURE_CLIENT_ID="<AppId>"
export AZURE_CLIENT_SECRET='<Password>'
export AZURE_LOCATION="eastus" # this should be an Azure region that your subscription has quota for.

Building your first cluster

Check out the Cluster API Quick Start to create your first Kubernetes cluster on Azure using Cluster API. Make sure to select the “Azure” tabs.

Documentation

Please see the CAPZ book for in-depth user documentation.

Troubleshooting Guide

Common issues users might run into when using Cluster API Provider for Azure. This list is work-in-progress. Feel free to open a PR to add to it if you find that useful information is missing.

Examples of troubleshooting real-world issues

No Azure resources are getting created

This is likely due to missing or invalid Azure credentials.

Check the CAPZ controller logs on the management cluster:

kubectl logs deploy/capz-controller-manager -n capz-system manager

If you see an error similar to this:

azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/123/providers/Microsoft.Compute/skus?%24filter=location+eq+%27eastus2%27&api-version=2019-04-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {\"error\":\"invalid_client\",\"error_description\":\"AADSTS7000215: Invalid client secret is provided.

Make sure the provided Service Principal client ID and client secret are correct and that the password has not expired.

The AzureCluster infrastructure is provisioned but no virtual machines are coming up

Your Azure subscription might have no quota for the requested VM size in the specified Azure location.

Check the CAPZ controller logs on the management cluster:

kubectl logs deploy/capz-controller-manager -n capz-system manager

If you see an error similar to this:

"error"="failed to reconcile AzureMachine: failed to create virtual machine: failed to create VM capz-md-0-qkg6m in resource group capz-fkl3tp: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=\u003cnil\u003e Code=\"OperationNotAllowed\" Message=\"Operation could not be completed as it results in exceeding approved standardDSv3Family Cores quota.

Follow the these steps. Alternatively, you can specify another Azure location and/or VM size during cluster creation.

A virtual machine is running but the k8s node did not join the cluster

Check the AzureMachine (or AzureMachinePool if using a MachinePool) status:

kubectl get azuremachines -o wide

If you see an output like this:

NAME                                       READY   STATE
default-template-md-0-w78jt                false   Updating

This indicates that the bootstrap script has not yet succeeded. Check the AzureMachine status.conditions field for more information.

Take a look at the cloud-init logs for further debugging.

One or more control plane replicas are missing

Take a look at the KubeadmControlPlane controller logs and look for any potential errors:

kubectl logs deploy/capi-kubeadm-control-plane-controller-manager -n capi-kubeadm-control-plane-system manager

In addition, make sure all pods on the workload cluster are healthy, including pods in the kube-system namespace.

Nodes are in NotReady state

Make sure you have installed a CNI on the workload cluster and that all the pods on the workload cluster are in running state.

Load Balancer service fails to come up

Check the cloud-controller-manager logs on the workload cluster.

If running the Azure cloud provider in-tree:

kubectl logs kube-controller-manager-<control-plane-node-name> -n kube-system 

If running the Azure cloud provider out-of-tree:

kubectl logs cloud-controller-manager -n kube-system 

Watching Kubernetes resources

To watch progression of all Cluster API resources on the management cluster you can run:

kubectl get cluster-api

Looking at controller logs

To check the CAPZ controller logs on the management cluster, run:

kubectl logs deploy/capz-controller-manager -n capz-system manager

Checking cloud-init logs (Ubuntu)

Cloud-init logs can provide more information on any issues that happened when running the bootstrap script.

Option 1: Using the Azure Portal

Located in the virtual machine blade, the boot diagnostics option is under the Support and Troubleshooting section in the Azure portal.

For more information, see here

Option 2: Using the Azure CLI

az vm boot-diagnostics get-boot-log --name MyVirtualMachine --resource-group MyResourceGroup

For more information, see here.

Option 3: With SSH

Using the ssh information provided during cluster creation (environment variable AZURE_SSH_PUBLIC_KEY_B64):

connect to first control node - capi is default linux user created by deployment
API_SERVER=$(kubectl get azurecluster capz-cluster -o jsonpath='{.spec.controlPlaneEndpoint.host}')
ssh capi@${API_SERVER}
list nodes
kubectl get azuremachines
NAME                               READY   STATE
capz-cluster-control-plane-2jprg   true    Succeeded
capz-cluster-control-plane-ck5wv   true    Succeeded
capz-cluster-control-plane-w4tv6   true    Succeeded
capz-cluster-md-0-s52wb            false   Failed
capz-cluster-md-0-w8xxw            true    Succeeded
pick node name from output above:
node=$(kubectl get azuremachine capz-cluster-md-0-s52wb -o jsonpath='{.status.addresses[0].address}')
ssh -J capi@${apiserver} capi@${node}
look at cloud-init logs

less /var/log/cloud-init-output.log

Automated log collection

As part of CI there is a log collection script which you can also leverage to pull all the logs for machines which will dump logs to ${PWD}/_artifacts} by default:

./hack/log/log-dump.sh

There are also some provided scripts that can help automate a few common tasks.

AAD Integration

CAPZ can be configured to use Azure Active Directory (AD) for user authentication. In this configuration, you can log into a CAPZ cluster using an Azure AD token. Cluster operators can also configure Kubernetes role-based access control (Kubernetes RBAC) based on a user’s identity or directory group membership.

Create Azure AD server component

Create the Azure AD application

export CLUSTER_NAME=my-aad-cluster
export AZURE_SERVER_APP_ID=$(az ad app create \
    --display-name "${CLUSTER_NAME}Server" \
    --identifier-uris "https://${CLUSTER_NAME}Server" \
    --query appId -o tsv)

Update the application group membership claims

az ad app update --id ${AZURE_SERVER_APP_ID} --set groupMembershipClaims=All

Create a service principal

az ad sp create --id ${AZURE_SERVER_APP_ID}

Create Azure AD client component

AZURE_CLIENT_APP_ID=$(az ad app create \
    --display-name "${CLUSTER_NAME}Client" \
    --native-app \
    --reply-urls "https://${CLUSTER_NAME}Client" \
    --query appId -o tsv)

Create a service principal

az ad sp create --id ${AZURE_CLIENT_APP_ID}

Grant the application API permissions

oAuthPermissionId=$(az ad app show --id ${AZURE_SERVER_APP_ID} --query "oauth2Permissions[0].id" -o tsv)
az ad app permission add --id ${AZURE_CLIENT_APP_ID} --api ${AZURE_SERVER_APP_ID} --api-permissions ${oAuthPermissionId}=Scope
az ad app permission grant --id ${AZURE_CLIENT_APP_ID} --api ${AZURE_SERVER_APP_ID}

Create the cluster

To deploy a cluster with support for AAD, use the aad flavor.

Make sure that AZURE_SERVER_APP_ID is set to the ID of the server AD application created above.

Get the admin kubeconfig

clusterctl get kubeconfig ${CLUSTER_NAME} > ./kubeconfig
export KUBECONFIG=./kubeconfig

Create Kubernetes RBAC binding

Get the user principal name (UPN) for the user currently logged in using the az ad signed-in-user show command. This user account is enabled for Azure AD integration in the next step:

az ad signed-in-user show --query objectId -o tsv

Create a YAML manifest my-azure-ad-binding.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-cluster-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: your_objectId

Create the ClusterRoleBinding using the kubectl apply command and specify the filename of your YAML manifest:

kubectl apply -f my-azure-ad-binding.yaml

Accessing the cluster

Install kubelogin

kubelogin is a client-go credential (exec) plugin implementing Azure authentication. Follow the setup instructions here.

Set the config user context

kubectl config set-credentials ad-user --exec-command kubelogin --exec-api-version=client.authentication.k8s.io/v1beta1 --exec-arg=get-token --exec-arg=--environment --exec-arg=$AZURE_ENVIRONMENT --exec-arg=--server-id --exec-arg=$AZURE_SERVER_APP_ID --exec-arg=--client-id --exec-arg=$AZURE_CLIENT_APP_ID --exec-arg=--tenant-id --exec-arg=$AZURE_TENANT_ID
kubectl config set-context ${CLUSTER_NAME}-ad-user@${CLUSTER_NAME} --user ad-user --cluster ${CLUSTER_NAME}

To verify it works, run:

kubectl config use-context ${CLUSTER_NAME}-ad-user@${CLUSTER_NAME}
kubectl get pods -A

You will receive a sign in prompt to authenticate using Azure AD credentials using a web browser. After you’ve successfully authenticated, the kubectl command should display the pods in the CAPZ cluster.

Adding AAD Groups

To add a group to the admin role run:

AZURE_GROUP_OID=<Your Group ObjectID>
kubectl create clusterrolebinding aad-group-cluster-admin-binding --clusterrole=cluster-admin --group=${AZURE_GROUP_OID}

Adding users

To add another user, create a additional role binding for that user:

USER_OID=<Your User ObjectID or UserPrincipalName>
kubectl create clusterrolebinding aad-user-binding --clusterrole=cluster-admin --user ${USER_OID}

You can update the cluster role bindings to suit your needs for that user or group. See the default role bindings for more details, and the general guide to Kubernetes RBAC.

Known Limitations

  • The user must not be a member of more than 200 groups.

API Server Endpoint

This document describes how to configure your clusters’ api server load balancer and IP.

Load Balancer Type

CAPZ supports two load balancer types, Public and Internal. Public, which is also the default, means that your API Server Load Balancer will have a publicly accessible IP address. Internal, also known as a “private cluster”, means that the API Server endpoint will only be accessible from within the cluster’s virtual network (or peered VNets).

A Public cluster will have an Azure public load balancer load balancing internet traffic to the control plane nodes.

A Private cluster will have an Azure internal load balancer load balancing traffic inside the VNet to the control plane nodes.

For more information on Azure load balancing, see Load Balancer documentation.

Here is an example of configuring the API Server LB type:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: my-private-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    apiServerLB:
      type: Internal

Private IP

When using an api server load balancer of type Internal, the default private IP address associated with that load balancer will be 10.0.0.100. If also specifying a custom virtual network, make sure you provide a private IP address that is in the range of your control plane subnet and not in use.

For example:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: my-private-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    vnet:
      name: my-vnet
      cidrBlocks: 
        - 172.16.0.0/16
    subnets:
      - name: my-subnet-cp
        role: control-plane
        cidrBlocks: 
          - 172.16.0.0/24
      - name: my-subnet-node
        role: node
        cidrBlocks: 
          - 172.16.2.0/24
    apiServerLB:
      type: Internal
      frontendIPs:
        - name: lb-private-ip-frontend
          privateIP: 172.16.0.100

Public IP

When using an api server load balancer of type Public, a dynamic public IP address will be created, along with a unique FQDN.

You can also choose to provide your own public api server IP. To do so, specify the existing public IP as follows:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    apiServerLB:
      type: Public
      frontendIPs:
        - name: lb-public-ip-frontend
          publicIP:
            name: my-public-ip
            dns: my-cluster-986b4408.eastus.cloudapp.azure.com

Note that dns is the FQDN associated to your public IP address (look for “DNS name” in the Azure Portal).

When you BYO api server IP, CAPZ does not manage its lifecycle, ie. the IP will not get deleted as part of cluster deletion.

Load Balancer SKU

At this time, CAPZ only supports Azure Standard Load Balancers. See SKU comparison for more information on Azure Load Balancers SKUs.

Configuring the Kubernetes Cloud Provider for Azure

The Azure cloud provider has a number of configuration options driven by a file on cluster nodes. This file canonically lives on a node at /etc/kubernetes/azure.json. The Azure cloud provider documentation details the configuration options exposed by this file.

CAPZ automatically generates this file based on user-provided values in AzureMachineTemplate and AzureMachine. All AzureMachines in the same MachineDeployment or control plane will all share a single cloud provider secret, while AzureMachines created inidividually will have their own secret.

For AzureMachineTemplate and standalone AzureMachines, the generated secret will have the name “${RESOURCE}-azure-json”, where “${RESOURCE}” is the name of either the AzureMachineTemplate or AzureMachine. The secret will have two data fields: control-plane-azure.json and worker-node-azure.json, with the raw content for that file containing the control plane and worker node data respectively. When the secret ${RESOURCE}-azure-json already exists in the same namespace as an AzureCluster and does not have the label "${CLUSTER_NAME}": "owned", CAPZ will not generate the default described above. Instead it will directly use whatever the user provides in that secret.

Overriding Cloud Provider Config

While many of the cloud provider config values are inferred from the capz infrastructure spec, there are other configuration parameters that cannot be inferred, and hence default to the values set by the azure cloud provider. In order to provider custom values to such configuration options through capz, you must use the spec.cloudProviderConfigOverrides in AzureCluster. The following example overrides the load balancer rate limit configuration:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: ${CLUSTER_NAME}
  namespace: default
spec:
  location: eastus
  networkSpec:
    vnet:
      name: ${CLUSTER_NAME}-vnet
  resourceGroup: cherry
  subscriptionID: ${AZURE_SUBSCRIPTION_ID}
  cloudProviderConfigOverrides:
    rateLimits:
      - name: "defaultRateLimit"
        config:
          cloudProviderRateLimit: true
          cloudProviderRateLimitBucket: 1
          cloudProviderRateLimitBucketWrite: 1
          cloudProviderRateLimitQPS: 1,
          cloudProviderRateLimitQPSWrite: 1,
      - name: "loadBalancerRateLimit"
        config:
          cloudProviderRateLimit: true
          cloudProviderRateLimitBucket: 2,
          CloudProviderRateLimitBucketWrite: 2,
          cloudProviderRateLimitQPS: 0,
          CloudProviderRateLimitQPSWrite: 0

External Cloud Provider

To deploy a cluster using external cloud provider, create a cluster configuration with the external cloud provider template.

After deploying the cluster, you should eventually see a set of pods like the following in a Running state:

kube-system   cloud-controller-manager                                            1/1     Running   0          41s
kube-system   cloud-node-manager-5pklx                                            1/1     Running   0          26s
kube-system   cloud-node-manager-hbbqt                                            1/1     Running   0          30s
kube-system   cloud-node-manager-mfsdg                                            1/1     Running   0          39s
kube-system   cloud-node-manager-qrz74                                            1/1     Running   0          24s

Storage Drivers

Azure File CSI Driver

To install the Azure File CSI driver please refer to the installation guide

Repository: https://github.com/kubernetes-sigs/azurefile-csi-driver

Azure Disk CSI Driver

To install the Azure Disk CSI driver please refer to the installation guide

Repository: https://github.com/kubernetes-sigs/azuredisk-csi-driver

Control Plane Outbound Load Balancer

This document describes how to configure your clusters’ control plane outbound load balancer.

Public Clusters

For public clusters ie. clusters with api server load balancer type set to Public, CAPZ automatically does not support adding a control plane outbound load balancer. This is because the api server load balancer already allows for outbound traffic in public clusters.

Private Clusters

For private clusters ie. clusters with api server load balancer type set to Internal, CAPZ does not create a control plane outbound load balancer by default. To create a control plane outbound load balancer, include the controlPlaneOutboundLB section with the desired settings.

Here is an example of configuring a control plane outbound load balancer with 1 front end ip for a private cluster:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: my-private-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    apiServerLB:
      type: Internal
    controlPlaneOutboundLB:
      frontendIPsCount: 1

Custom Private DNS Zone Name

It is possible to set the DNS zone name to a custom value by setting PrivateDNSZoneName in the NetworkSpec. By default the DNS zone name is ${CLUSTER_NAME}.capz.io.

This feature is enabled only if the apiServerLB.type is Internal

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: cluster-example
  namespace: default
spec:
  location: southcentralus
  networkSpec:
    privateDNSZoneName: "kubernetes.myzone.com"
    vnet:
      name: my-vnet
      cidrBlocks:
        - 10.0.0.0/16
    subnets:
      - name: my-subnet-cp
        role: control-plane
        cidrBlocks:
          - 10.0.1.0/24
      - name: my-subnet-node
        role: node
        cidrBlocks:
          - 10.0.2.0/24
    apiServerLB:
      type: Internal
      frontendIPs:
        - name: lb-private-ip-frontend
          privateIP: 172.16.0.100
  resourceGroup: cluster-example

Custom images

This document will help you get a CAPZ Kubernetes cluster up and running with your custom image.

Reference images

An image defines the operating system and Kubernetes components that will populate the disk of each node in your cluster.

By default, images offered by “capi” in the Azure Marketplace are used. You can list these reference images with this command:

az vm image list --publisher cncf-upstream --offer capi --all -o table

Note: These images are not updated for security fixes and it is recommended to always use the latest patch version for the Kubernetes version you wish to run. For production-like environments, and for more control over your nodes, it is highly recommended to build and use your own custom images.

Building a custom image

Cluster API uses the Kubernetes Image Builder tools. You should use the Azure images from that project as a starting point for your custom image.

The Image Builder Book explains how to build the images defined in that repository, with instructions for Azure CAPI Images in particular.

Operating system requirements

For your custom image to work with Cluster API, it must meet the operating system requirements of the bootstrap provider. For example, the default kubeadm bootstrap provider has a set of preflight checks that a VM is expected to pass before it can join the cluster.

Kubernetes version requirements

The reference images are each built to support a specific version of Kubernetes. When using your custom images based on them, take care to match the image to the version: field of the KubeadmControlPlane and MachineDeployment in the YAML template for your workload cluster.

To upgrade to a new Kubernetes release with custom images requires this preparation:

  • create a new custom image which supports the Kubernetes release version
  • copy the existing AzureMachineTemplate and change its image: section to reference the new custom image
  • create the new AzureMachineTemplate on the management cluster
  • modify the existing KubeadmControlPlane and MachineDeployment to reference the new AzureMachineTemplate and update the version: field to match

See Upgrading workload clusters for more details.

Creating a cluster from a custom image

To use a custom image, it needs to be referenced in an image: section of your AzureMachineTemplate. See below for more specific examples.

Using Shared Image Gallery (Recommended)

To use an image from the Shared Image Gallery, fill in the resourceGroup, name, subscriptionID, gallery, and version fields:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: capz-shared-gallery-example
spec:
  template:
    spec:
      image:
        sharedGallery:
          resourceGroup: "cluster-api-images"
          name: "capi-1234567890"
          subscriptionID: "01234567-89ab-cdef-0123-4567890abcde"
          gallery: "ClusterAPI"
          version: "0.3.1234567890"

If you build Azure CAPI images with the make targets in Image Builder, these required values are printed after a successful build. For example:

$ make -C images/capi/ build-azure-sig-ubuntu-1804
# many minutes later...
==> sig-ubuntu-1804:
Build 'sig-ubuntu-1804' finished.

==> Builds finished. The artifacts of successful builds are:
--> sig-ubuntu-1804: Azure.ResourceManagement.VMImage:

OSType: Linux
ManagedImageResourceGroupName: cluster-api-images
ManagedImageName: capi-1234567890
ManagedImageId: /subscriptions/01234567-89ab-cdef-0123-4567890abcde/resourceGroups/cluster-api-images/providers/Microsoft.Compute/images/capi-1234567890
ManagedImageLocation: southcentralus
ManagedImageSharedImageGalleryId: /subscriptions/01234567-89ab-cdef-0123-4567890abcde/resourceGroups/cluster-api-images/providers/Microsoft.Compute/galleries/ClusterAPI/images/capi-ubuntu-1804/versions/0.3.1234567890

Please also see the replication recommendations for the Shared Image Gallery.

If the image you want to use is based on an image released by a third party publisher such as for example Flatcar Linux by Kinvolk, then you need to specify the publisher, offer, and sku fields as well:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: capz-shared-gallery-example
spec:
  template:
    spec:
      image:
        sharedGallery:
          resourceGroup: "cluster-api-images"
          name: "capi-1234567890"
          subscriptionID: "01234567-89ab-cdef-0123-4567890abcde"
          gallery: "ClusterAPI"
          version: "0.3.1234567890"
          publisher: "kinvolk"
          offer: "flatcar-container-linux-free"
          sku: "stable"

This will make API calls to create Virtual Machines or Virtual Machine Scale Sets to have the Plan correctly set.

Using image ID

To use a managed image resource by ID, only the id field must be set:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: capz-image-id-example
spec:
  template:
    spec:
      image:
        id: "/subscriptions/01234567-89ab-cdef-0123-4567890abcde/resourceGroups/myResourceGroup/providers/Microsoft.Compute/images/myImage"

A managed image resource can be created from a Virtual Machine. Please refer to Azure documentation on creating a managed image for more detail.

Managed images support only 20 simultaneous deployments, so for most use cases Shared Image Gallery is recommended.

Using Azure Marketplace

To use an image from Azure Marketplace, populate the publisher, offer, sku, and version fields and, if this image is published by a third party publisher, set the thirdPartyImage flag to true so an image Plan can be generated for it. In the case of a third party image, you must accept the license terms with the Azure CLI before consuming it.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: capz-marketplace-example
spec:
  template:
    spec:
      image:
        marketplace:
          publisher: "example-publisher"
          offer: "example-offer"
          sku: "k8s-1dot18dot8-ubuntu-1804"
          version: "2020-07-25"
          thirdPartyImage: true

Data Disks

This document describes how to specify data disks to be provisioned and attached to VMs provisioned in Azure.

Azure Machine Data Disks

Azure Machines support optionally specifying a list of data disks to be attached to the virtual machine. Each data disk must have:

  • nameSuffix - the name suffix of the disk to be created. Each disk will be named <machineName>_<nameSuffix> to ensure uniqueness.
  • diskSizeGB - the disk size in GB.
  • managedDisk - (optional) the managed disk for a VM (see below)
  • lun - the logical unit number (see below)

Managed Disk Options

See Introduction to Azure managed disks for more information.

Disk LUN

The LUN specifies the logical unit number of the data disk, between 0 and 63. Its value is used to identify data disks within the VM and therefore must be unique for each data disk attached to a VM.

When adding data disks to a Linux VM, you may encounter errors if a disk does not exist at LUN 0. It is therefore recommended to ensure that the first data disk specified is always added at LUN 0.

See Attaching a disk to a Linux VM on Azure for more information.

IMPORTANT! The lun specified in the AzureMachine Spec must match the LUN used to refer to the device in Kubeadm diskSetup. See below for an example.

Ultra disk support for data disks

If we use StorageAccountType as UltraSSD_LRS in Managed Disks, the ultra disk support will be enabled for the region and zone which supports the UltraSSDAvailable capability.

To check all available vm-sizes in a given region which supports availability zone that has the UltraSSDAvailable capability supported, execute following using Azure CLI:

az vm list-skus -l <location> -z -s <VM-size>

See Ultra disk for ultra disk performance and GA scope.

Configuring partitions, file systems and mounts

KubeadmConfig makes it easy to partition, format, and mount your data disk so your Linux VM can use it. Use the diskSetup and mounts options to describe partitions, file systems and mounts.

You may refer to your device as /dev/disk/azure/scsi1/lun<i> where <i> is the LUN.

See cloud-init documentation for more information about cloud-init disk setup.

Example

The below example shows how to create and attach a custom data disk “my_disk” at LUN 1 for every control plane machine, in addition to the etcd data disk. NOTE: the same can be applied to worker machines.

kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1alpha4
metadata:
  name: "${CLUSTER_NAME}-control-plane"
spec:
    [...]
    diskSetup:
      partitions:
        - device: /dev/disk/azure/scsi1/lun0
          tableType: gpt
          layout: true
          overwrite: false
        - device: /dev/disk/azure/scsi1/lun1
          tableType: gpt
          layout: true
          overwrite: false
      filesystems:
        - label: etcd_disk
          filesystem: ext4
          device: /dev/disk/azure/scsi1/lun0
          extraOpts:
            - "-E"
            - "lazy_itable_init=1,lazy_journal_init=1"
        - label: ephemeral0
          filesystem: ext4
          device: ephemeral0.1
          replaceFS: ntfs
        - label: my_disk
          filesystem: ext4
          device: /dev/disk/azure/scsi1/lun1
    mounts:
      - - LABEL=etcd_disk
        - /var/lib/etcddisk
      - - LABEL=my_disk
        - /var/lib/mydir
---
kind: AzureMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
metadata:
  name: "${CLUSTER_NAME}-control-plane"
spec:
  template:
    spec:
      [...]
      dataDisks:
        - nameSuffix: etcddisk
          diskSizeGB: 256
          managedDisk:
            storageAccountType: Standard_LRS
          lun: 0
        - nameSuffix: mydisk
          diskSizeGB: 128
          lun: 1

OS Disk

This document describes how to configure the OS disk for VMs provisioned in Azure.

Managed Disk Options

Storage Account Type

By default, Azure will pick the supported storage account type for your AzureMachine based on the specified VM size. If you’d like to specify a specific storage type, you can do so by specifying a storageAccountType:

        managedDisk:
          storageAccountType: Premium_LRS

Supported values are Premium_LRS, Standard_LRS, and StandardSSDLRS. Note that UltraSSD_LRS can only be used with data disks, it cannot be used with OS Disk.

Also, note that not all Azure VM sizes support Premium storage. To learn more about which sizes are premium storage-compatible, see Sizes for virtual machines in Azure.

See Azure documentation on disk types to learn more about the different storage types.

See Introduction to Azure managed disks for more information on managed disks.

If the optional field diskSizeGB is not provided, it will default to 30GB.

Ephemeral OS

Ephemeral OS uses local VM storage for changes to the OS disk. Storage devices local to the VM host will not be bound by normal managed disk SKU limits. Instead they will always be capable of saturating the VM level limits. This can significantly improve performance on the OS disk. Ephemeral storage used for the OS will not persist between maintenance events and VM redeployments. This is ideal for stateless base OS disks, where any stateful data is kept elsewhere.

There are a few kinds of local storage devices available on Azure VMs. Each VM size will have a different combination. For example, some sizes support premium storage caching, some sizes have a temp disk while others do not, and some sizes have local nvme devices with direct access. Ephemeral OS uses the cache for the VM size, if one exists. Otherwise it will try to use the temp disk if the VM has one. These are the only supported options, and we do not expose the ability to manually choose between these two disks (the default behavior is typically most desirable). This corresponds to the placement property in the Azure Compute REST API.

See the Azure documentation for full details.

Azure Machine DiffDiskSettings

Azure Machines support optionally specifying a field called diffDiskSettings. This mirrors the Azure Compute REST API.

When diffDiskSettings.option is set to Local, ephemeral OS will be enabled. We use the API shape provided by compute directly as they expose other options, although this is the main one relevant at this time.

Known Limitations

Not all SKU sizes support ephemeral OS. CAPZ will query Azure’s resource SKUs API to check if the requested VM size supports ephemeral OS. If not, the azuremachine controller will log an event with the corresponding error on the AzureMachine object.

Example

The below example shows how to enable ephemeral OS for a machine template. For control plane nodes, we strongly recommend using etcd data disks to avoid data loss.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: default
spec:
  template:
    spec:
      location: ${AZURE_LOCATION}
      osDisk:
        diffDiskSettings:
          option: Local
        diskSizeGB: 30
        managedDisk:
          storageAccountType: Standard_LRS
        osType: Linux
      sshPublicKey: ${AZURE_SSH_PUBLIC_KEY_B64:=""}
      vmSize: ${AZURE_NODE_MACHINE_TYPE}

Failure Domains

The Azure provider includes the support for failure domains introduced as part of v1alpha3.

Failure domains in Azure

A failure domain in the Azure provider maps to an availability zone within an Azure region. In Azure an availability zone is a separate data center within a region that offers redundancy and separation from the other availability zones within a region.

To ensure a cluster (or any application) is resilient to failure it is best to spread instances across all the availability zones within a region. If a zone goes down, your cluster will continue to run as the other 2 zones are physically separated and can continue to run.

Full details of availability zones, regions can be found in the Azure docs.

How to use failure domains

Default Behaviour

By default, only control plane machines get automatically spread to all cluster zones. A workaround for spreading worker machines is to create N MachineDeployments for your N failure domains, scaling them independently. Resiliency to failures comes through having multiple MachineDeployments (see below).

apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: default
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-md-0
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureMachineTemplate
        name: ${CLUSTER_NAME}-md-0
      version: ${KUBERNETES_VERSION}
      failureDomain: "1"
---
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-1
  namespace: default
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-md-1
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureMachineTemplate
        name: ${CLUSTER_NAME}-md-1
      version: ${KUBERNETES_VERSION}
      failureDomain: "2"
---
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-2
  namespace: default
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-md-2
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureMachineTemplate
        name: ${CLUSTER_NAME}-md-2
      version: ${KUBERNETES_VERSION}
      failureDomain: "3"

The Cluster API controller will look for the FailureDomains status field and will set the FailureDomain field in a Machine if a value hasn’t already been explicitly set. It will try to ensure that the machines are spread across all the failure domains.

The AzureMachine controller looks for a failure domain (i.e. availability zone) to use from the Machine first before failure back to the AzureMachine. This failure domain is then used when provisioning the virtual machine.

Explicit Placement

If you would rather control the placement of virtual machines into a failure domain (i.e. availability zones) then you can explicitly state the failure domain. The best way is to specify this using the FailureDomain field within the Machine (or MachineDeployment) spec.

DEPRECATION NOTE: Failure domains were introduced in v1alpha3. Prior to this you might have used the AvailabilityZone on the AzureMachine. This has been deprecated in v1alpha3, and now removed in v1alpha4. Please update your definitions and use FailureDomain instead.

For example:

apiVersion: cluster.x-k8s.io/v1alpha4
kind: Machine
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: my-cluster
    cluster.x-k8s.io/control-plane: "true"
  name: controlplane-0
  namespace: default
spec:
  version: "v1.21.2"
  clusterName: my-cluster
  failureDomain: "1"
  bootstrap:
    configRef:
        apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
        kind: KubeadmConfigTemplate
        name: my-cluster-md-0
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureMachineTemplate
    name: my-cluster-md-0

Using Virtual Machine Scale Sets

You can use an AzureMachinePool object to deploy a Virtual Machine Scale Set which automatically distributes VM instances across the configured availability zones. Set the FailureDomains field to the list of availability zones that you want to use. Be aware that not all regions have the same availability zones. You can use az vm list-skus -l <location> --zone -o table to list all the available zones per vm size in that location/region.

apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachinePool
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: my-cluster
  name: ${CLUSTER_NAME}-vmss-0
  namespace: default
spec:
  clusterName: my-cluster
  failureDomains:
    - "1"
    - "3"
  replicas: 3
  template:
    spec:
      clusterName: my-cluster
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-vmss-0
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AzureMachinePool
        name: ${CLUSTER_NAME}-vmss-0
      version: ${KUBERNETES_VERSION}
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureMachinePool
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: my-cluster
  name: ${CLUSTER_NAME}-vmss-0
  namespace: default
spec:
  location: westeurope
  template:
    osDisk:
      diskSizeGB: 30
      osType: Linux
    vmSize: Standard_D2s_v3

Availability sets when there are no failure domains

Although failure domains provide protection against datacenter failures, not all azure regions support availability zones. In such cases, azure availability sets can be used to provide redundancy and high availability.

When cluster api detects that the region has no failure domains, it creates availability sets for different groups of virtual machines. The virtual machines, when created, are assigned an availability set based on the group they belong to.

The availability sets created are as follows:

  1. For control plane vms, an availability set will be created and suffixed with the string “control-plane”.
  2. For Worker node vms, an availability set will be created for each machine deployment, and suffixed with the machine deployment name.

Consider the following cluster configuration:

apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
metadata:
  labels:
    cni: calico
  name: ${CLUSTER_NAME}
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
    kind: KubeadmControlPlane
    name: ${CLUSTER_NAME}-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AzureCluster
    name: ${CLUSTER_NAME}
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: default
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-md-0
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AzureMachineTemplate
        name: ${CLUSTER_NAME}-md-0
      version: ${KUBERNETES_VERSION}
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-1
  namespace: default
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-md-1
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AzureMachineTemplate
        name: ${CLUSTER_NAME}-md-1
      version: ${KUBERNETES_VERSION}
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: ${CLUSTER_NAME}-md-2
  namespace: default
spec:
  clusterName: ${CLUSTER_NAME}
  replicas: ${WORKER_MACHINE_COUNT}
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: ${CLUSTER_NAME}-md-2
      clusterName: ${CLUSTER_NAME}
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AzureMachineTemplate
        name: ${CLUSTER_NAME}-md-2
      version: ${KUBERNETES_VERSION}

In the example above, there will be 4 availability sets created, 1 for the control plane, and 1 for each of the 3 machine deployments.

Flannel

This document describes how to use Flannel as your CNI solution. By default, the CNI plugin is not installed for self-managed clusters, so you have to install your own (e.g. Calico with VXLAN).

Modify the Cluster resources

Before deploying the cluster, change the KubeadmControlPlane value at spec.kubeadmConfigSpec.clusterConfiguration.controllerManager.extraArgs.allocate-node-cidrs to "true"

apiVersion: controlplane.cluster.x-k8s.io/v1alpha4
kind: KubeadmControlPlane
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "true"

Modify Flannel config

NOTE: This is based off of the instructions at: https://github.com/flannel-io/flannel#deploying-flannel-manually

You need to make an adjustment to the default flannel configuration so that the CIDR inside your CAPZ cluster matches the Flannel Network CIDR.

View your capi-cluster.yaml and make note of the Cluster Network CIDR Block. For example:

apiVersion: cluster.x-k8s.io/v1alpha4
kind: Cluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16

Download the file at https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml and modify the kube-flannel-cfg ConfigMap. Set the value at data.net-conf.json.Network value to match your Cluster Network CIDR Block.

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Edit kube-flannel.yml and change this section so that the Network section matches your Cluster CIDR

kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
data:
  net-conf.json: |
    {
      "Network": "192.168.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

Apply kube-flannel.yml

kubectl apply -f kube-flannel.yml

GPU-enabled clusters

Overview

With CAPZ you can create GPU-enabled Kubernetes clusters on Microsoft Azure.

Before you begin, be aware that:

  • Scheduling GPUs is a Kubernetes beta feature
  • NVIDIA GPUs are supported on Azure NC-series, NV-series, and NVv3-series VMs
  • NVIDIA GPU Operator allows administrators of Kubernetes clusters to manage GPU nodes just like CPU nodes in the cluster.

To deploy a cluster with support for GPU nodes, use the nvidia-gpu flavor.

An example GPU cluster

Let’s create a CAPZ cluster with an N-series node and run a GPU-powered vector calculation.

Generate an nvidia-gpu cluster template

Use the clusterctl generate cluster command to generate a manifest that defines your GPU-enabled workload cluster.

Remember to use the nvidia-gpu flavor with N-series nodes.

AZURE_CONTROL_PLANE_MACHINE_TYPE=Standard_D2s_v3 \
AZURE_NODE_MACHINE_TYPE=Standard_NC6s_v3 \
AZURE_LOCATION=southcentralus \
clusterctl generate cluster azure-gpu \
  --kubernetes-version=v1.21.2 \
  --worker-machine-count=1 \
  --flavor=nvidia-gpu > azure-gpu-cluster.yaml

Create the cluster

Apply the manifest from the previous step to your management cluster to have CAPZ create a workload cluster:

$ kubectl apply -f azure-gpu-cluster.yaml --server-side
cluster.cluster.x-k8s.io/azure-gpu serverside-applied
azurecluster.infrastructure.cluster.x-k8s.io/azure-gpu serverside-applied
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/azure-gpu-control-plane serverside-applied
azuremachinetemplate.infrastructure.cluster.x-k8s.io/azure-gpu-control-plane serverside-applied
machinedeployment.cluster.x-k8s.io/azure-gpu-md-0 serverside-applied
azuremachinetemplate.infrastructure.cluster.x-k8s.io/azure-gpu-md-0 serverside-applied
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/azure-gpu-md-0 serverside-applied
clusterresourceset.addons.cluster.x-k8s.io/crs-gpu-operator serverside-applied
configmap/nvidia-clusterpolicy-crd serverside-applied
configmap/nvidia-gpu-operator-components serverside-applied
clusterresourceset.addons.cluster.x-k8s.io/azure-gpu-crs-0 serverside-applied

Wait until the cluster and nodes are finished provisioning. The GPU nodes may take several minutes to provision, since each one must install drivers and supporting software.

$ kubectl get cluster azure-gpu
NAME        PHASE
azure-gpu   Provisioned
$ kubectl get machines
NAME                             PROVIDERID                                                                                                                                     PHASE     VERSION
azure-gpu-control-plane-t94nm    azure:////subscriptions/<subscription_id>/resourceGroups/azure-gpu/providers/Microsoft.Compute/virtualMachines/azure-gpu-control-plane-nnb57   Running   v1.21.2
azure-gpu-md-0-f6b88dd78-vmkph   azure:////subscriptions/<subscription_id>/resourceGroups/azure-gpu/providers/Microsoft.Compute/virtualMachines/azure-gpu-md-0-gcc8v            Running   v1.21.2

Install a CNI of your choice. Once the nodes are Ready, run the following commands against the workload cluster to check if all the gpu-operator resources are installed:

$ clusterctl get kubeconfig azure-gpu > azure-gpu-cluster.conf
$ export KUBECONFIG=azure-gpu-cluster.conf
$ kubectl get pods | grep gpu-operator
default                  gpu-operator-1612821988-node-feature-discovery-master-664dnsmww   1/1     Running                 0          107m
default                  gpu-operator-1612821988-node-feature-discovery-worker-64mcz       1/1     Running                 0          107m
default                  gpu-operator-1612821988-node-feature-discovery-worker-h5rws       1/1     Running                 0          107m
$ kubectl get pods -n gpu-operator-resources
NAME                                       READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-66d4f                1/1     Running     0          2s
nvidia-container-toolkit-daemonset-lxpkx   1/1     Running     0          3m11s
nvidia-dcgm-exporter-wwnsw                 1/1     Running     0          5s
nvidia-device-plugin-daemonset-lpdwz       1/1     Running     0          13s
nvidia-device-plugin-validation            0/1     Completed   0          10s
nvidia-driver-daemonset-w6lpb              1/1     Running     0          3m16s

Then run the following commands against the workload cluster to verify that the NVIDIA device plugin has initialized and the nvidia.com/gpu resource is available:

$ kubectl -n kube-system get po | grep nvidia
kube-system   nvidia-device-plugin-daemonset-d5dn6                    1/1     Running   0          16m
$ kubectl get nodes
NAME                            STATUS   ROLES    AGE   VERSION
azure-gpu-control-plane-nnb57   Ready    master   42m   v1.21.2
azure-gpu-md-0-gcc8v            Ready    <none>   38m   v1.21.2
$ kubectl get node azure-gpu-md-0-gcc8v -o jsonpath={.status.allocatable} | jq
{
  "attachable-volumes-azure-disk": "12",
  "cpu": "6",
  "ephemeral-storage": "119716326407",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "115312060Ki",
  "nvidia.com/gpu": "1",
  "pods": "110"
}

Run a test app

Let’s create a pod manifest for the cuda-vector-add example from the Kubernetes documentation and deploy it:

$ cat > cuda-vector-add.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
EOF
$ kubectl apply -f cuda-vector-add.yaml

The container will download, run, and perform a CUDA calculation with the GPU.

$ kubectl get po cuda-vector-add
cuda-vector-add   0/1     Completed   0          91s
$ kubectl logs cuda-vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If you see output like the above, your GPU cluster is working!

How to use Identities with CAPZ

CAPZ controller:

This is the identity used by the management cluster to provision infrastructure in Azure

  • Multi-tenant config via AzureClusterIdentity

    • AAD Pod Identity using Service Principals and Managed Identities: by default, the identity used by the workload cluster running on Azure is the same Service Principal assigned to the management cluster. If an identity is specified on the Azure Cluster Resource, that identity will be used when creating Azure resources related to that cluster. See Multi-tenancy page for details.
  • Env config (deprecated)

    • Service Principal: A service principal is an identity in AAD which is described by a TenantID, ClientID, and ClientSecret. The set of these three values will enable the holder to exchange the values for a JWT token to communicate with Azure. The values are normally stored in a file or environment variables.
    • Configuration:
      • Scope: Subscription
      • Role: Contributor since the controller is responsible for creating resource groups and cluster resources within the group. To create a resource group within a subscription, one must have subscription contributor rights. Note, this role’s scope can be reduced to Resource Group Contributor if all resource groups are created prior to cluster creation.
      • If the workload clusters are going to use system-assigned managed identities, then the role here should be Owner to be able to create role assignments for system-assigned managed identity. More details in Azure built-in roles documentation.

Azure Host Identity:

The identity assigned to the Azure host which in the control plane provides the identity to Azure Cloud Provider, and can be used on all nodes to provide access to Azure services during cloud-init, etc.

  • User-assigned Managed Identity
  • System-assigned Managed Identity
  • Service Principal
  • See details about each type in the VM identity page

Pod Identity:

The identity used by pods running within the workload cluster to provide access to Azure services during runtime. For example, to access blobs stored in Azure Storage or to access Azure database services.

  • AAD Pod Identity: The workload cluster requires an identity to communicate with Azure. This identity can be either a managed identity (in the form of system-assigned identity or user-assigned identity) or a service principal. The AAD Pod Identity pod allows the cluster to use the Identity referenced by the Azure CLuster to access cloud resources securely with Azure Active Directory.

User Stories

Story 1 - Locked down with Service Principal Per Subscription

Alex is an engineer in a large organization which has a strict Azure account architecture. This architecture dictates that Kubernetes clusters must be hosted in dedicated Subscriptions with AAD identity having RBAC rights to provision the infrastructure only in the Subscription. The workload clusters must run with a System Assigned machine identity. The organization has adopted Cluster API in order to manage Kubernetes infrastructure, and expects ‘management’ clusters running the Cluster API controllers to manage ‘workload’ clusters in dedicated Azure Subscriptions with an AAD account which only has access to that Subscription.

The current configuration exists:

  • Subscription for each cluster
  • AAD Service Principals with Subscription Owner rights for each Subscription
  • A management Kubernetes cluster running Cluster API Provider Azure controllers

Alex can provision a new workload cluster in the specified Subscription with the corresponding AAD Service Principal by creating new Cluster API resources in the management cluster. Each of the workload cluster machines would run as the System Assigned identity described in the Cluster API resources. The CAPZ controller in the management cluster uses the Service Principal credentials when reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.

Story 2 - Locked down by Namespace and Subscription

Alex is an engineer in a large organization which has a strict Azure account architecture. This architecture dictates that Kubernetes clusters must be hosted in dedicated Subscriptions with AAD identity having RBAC rights to provision the infrastructure only in the Subscription. The workload clusters must run with a System Assigned machine identity.

Erin is a security engineer in the same company as Alex. Erin is responsible for provisioning identities. Erin will create a Service Principal for use by Alex to provision the infrastructure in Alex’s cluster. The identity Erin creates should only be able to be used in a predetermined Kubernetes namespace where Alex will define the workload cluster. The identity should be able to be used by CAPZ to provision workload clusters in other namespaces.

The organization has adopted Cluster API in order to manage Kubernetes infrastructure, and expects ‘management’ clusters running the Cluster API controllers to manage ‘workload’ clusters in dedicated Azure Subscriptions with an AAD account which only has access to that Subscription.

The current configuration exists:

  • Subscription for each cluster
  • AAD Service Principals with Subscription Owner rights for each Subscription
  • A management Kubernetes cluster running Cluster API Provider Azure controllers

Alex can provision a new workload cluster in the specified Subscription with the corresponding AAD Service Principal by creating new Cluster API resources in the management cluster in the predetermined namespace. Each of the workload cluster machines would run as the System Assigned identity described in the Cluster API resources. The CAPZ controller in the management cluster uses the Service Principal credentials when reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.

Erin can provision an identity in a namespace of limited access and define the allowed namespaces, which will include the predetermined namespace for the workload cluster.

Story 3 - Using an Azure User Assigned Identity

Erin is an engineer working in a large organization. Erin does not want to be responsible for ensuring Service Principal secrets are rotated on a regular basis. Erin would like to use an Azure User Assigned Identity to provision workload cluster infrastructure. The User Assigned Identity will have the RBAC rights needed to provision the infrastructure in Erin’s subscription.

The current configuration exists:

  • Subscription for the workload cluster
  • A User Assigned Identity with RBAC with Subscription Owner rights for the Subscription
  • A management Kubernetes cluster running Cluster API Provider Azure controllers

Erin can provision a new workload cluster in the specified Subscription with the Azure User Assigned Identity by creating new Cluster API resources in the management cluster. The CAPZ controller in the management cluster uses the User Assigned Identity credentials when reconciling the AzureCluster so that it can create/use/destroy resources in the workload cluster.

Story 4 - Legacy Behavior Preserved

Dascha is an engineer in a smaller, less strict organization with a few Azure accounts intended to build all infrastructure. There is a single Azure Subscription named ‘dev’, and Dascha wants to provision a new cluster in this Subscription. An existing Kubernetes cluster is already running the Cluster API operators and managing resources in the dev Subscription. Dascha can provision a new cluster by creating Cluster API resources in the existing cluster, omitting the ProvisionerIdentity field in the AzureCluster spec. The CAPZ operator will use the Azure credentials provided in its deployment template.

IPv6 clusters

Overview

CAPZ enables you to create IPv6 Kubernetes clusters on Microsoft Azure.

  • IPv6 support is available for Kubernetes version 1.18.0 and later on Azure.
  • IPv6 support is in beta as of Kubernetes version 1.18 in Kubernetes community.

To deploy a cluster using IPv6, use the ipv6 flavor template.

Things to try out after the cluster created:

  • Nodes are Kubernetes version 1.18.0 or later
  • Nodes have an IPv6 Internal-IP
kubectl get nodes -o wide
NAME                         STATUS   ROLES    AGE   VERSION   INTERNAL-IP              EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
ipv6-0-control-plane-8xqgw   Ready    master   53m   v1.18.8   2001:1234:5678:9abc::4   <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   containerd://1.3.4
ipv6-0-control-plane-crpvf   Ready    master   49m   v1.18.8   2001:1234:5678:9abc::5   <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   containerd://1.3.4
ipv6-0-control-plane-nm5v9   Ready    master   46m   v1.18.8   2001:1234:5678:9abc::6   <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   containerd://1.3.4
ipv6-0-md-0-7k8vm            Ready    <none>   49m   v1.18.8   2001:1234:5678:9abd::5   <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   containerd://1.3.4
ipv6-0-md-0-mwfpt            Ready    <none>   50m   v1.18.8   2001:1234:5678:9abd::4   <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   containerd://1.3.4
  • Nodes have 2 internal IPs, one from each IP family. IPv6 clusters on Azure run on dual-stack hosts. The IPv6 is the primary IP.
kubectl get nodes ipv6-0-md-0-7k8vm -o go-template --template='{{range .status.addresses}}{{printf "%s: %s \n" .type .address}}{{end}}'
Hostname: ipv6-0-md-0-7k8vm
InternalIP: 2001:1234:5678:9abd::5
InternalIP: 10.1.0.5
  • Nodes have an IPv6 PodCIDR
kubectl get nodes ipv6-0-md-0-7k8vm -o go-template --template='{{.spec.podCIDR}}'
2001:1234:5678:9a40:200::/72
  • Pods have an IPv6 IP
kubectl get pods nginx-f89759699-h65lt -o go-template --template='{{.status.podIP}}'
2001:1234:5678:9a40:300::1f
  • Able to reach other pods in cluster using IPv6
# inside the nginx-pod
#  # ifconfig eth0
  eth0      Link encap:Ethernet  HWaddr 3E:DA:12:82:4C:C2
            inet6 addr: fe80::3cda:12ff:fe82:4cc2/64 Scope:Link
            inet6 addr: 2001:1234:5678:9a40:100::4/128 Scope:Global
            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
            RX packets:15 errors:0 dropped:0 overruns:0 frame:0
            TX packets:20 errors:0 dropped:1 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:1562 (1.5 KiB)  TX bytes:1832 (1.7 KiB)
# ping 2001:1234:5678:9a40::2
PING 2001:1234:5678:9a40::2 (2001:1234:5678:9a40::2): 56 data bytes
64 bytes from 2001:1234:5678:9a40::2: seq=0 ttl=62 time=1.690 ms
64 bytes from 2001:1234:5678:9a40::2: seq=1 ttl=62 time=1.009 ms
64 bytes from 2001:1234:5678:9a40::2: seq=2 ttl=62 time=1.388 ms
64 bytes from 2001:1234:5678:9a40::2: seq=3 ttl=62 time=0.925 ms
  • Kubernetes services have IPv6 ClusterIP and ExternalIP
kubectl get svc
NAME            TYPE           CLUSTER-IP   EXTERNAL-IP           PORT(S)          AGE
kubernetes      ClusterIP      fd00::1      <none>                443/TCP          94m
nginx-service   LoadBalancer   fd00::4a12   2603:1030:805:2::b    80:32136/TCP     40m
  • Able to reach the workload on IPv6 ExternalIP

NOTE: this will only work if your ISP has IPv6 enabled. Alternatively, you can connect from an Azure VM with IPv6.

curl [2603:1030:805:2::b] -v
* Rebuilt URL to: [2603:1030:805:2::b]/
*   Trying 2603:1030:805:2::b...
* TCP_NODELAY set
* Connected to 2603:1030:805:2::b (2603:1030:805:2::b) port 80 (#0)
> GET / HTTP/1.1
> Host: [2603:1030:805:2::b]
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.17.0
< Date: Fri, 18 Sep 2020 23:07:12 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 21 May 2019 15:33:12 GMT
< Connection: keep-alive
< ETag: "5ce41a38-264"
< Accept-Ranges: bytes

Known Limitations

The reference ipv6 flavor takes care of most of these for you, but it is important to be aware of these if you decide to write your own IPv6 cluster template, or use a different bootstrap provider.

  • Kubernetes version needs to be 1.18+

  • etcd needs to listen on 127.0.0.1:2379 in addition to IPv6 IPs to resolve an issue with the etcd health check as the dial transport is only doing IPv4. This is done by modifying the listen-client-urls etcd arg in postKubeadmCommands as follows:

    - sed -i '\#--listen-client-urls#s#$#,https://127.0.0.1:2379#' /etc/kubernetes/manifests/etcd.yaml
  • The :53 port needs to be free on the host so coredns can use it. In 18.04, systemd-resolved uses the port :53 on the host and is used by default for DNS. This causes the coredns pods to crash for single stack IPv6 with bind address already in use as coredns pods are run on hostNetwork to leverage the host routes for DNS resolution. This is done by running the following commands in postKubeadmCommands:
    - echo "DNSStubListener=no" >> /etc/systemd/resolved.conf
    - mv /etc/resolv.conf /etc/resolv.conf.OLD && ln -s /run/systemd/resolve/resolv.conf
      /etc/resolv.conf
    - systemctl restart systemd-resolved
  • The coredns pod needs to run on the host network, so it can leverage host routes for the v4 network to do the DNS resolution. The workaround is to edit the coredns deployment and add hostNetwork: true:
kubectl patch deploy/coredns -n kube-system --type=merge -p '{"spec": {"template": {"spec":{"hostNetwork": true}}}}'
  • When using Calico CNI, the selected pod’s subnet should be part of your Azure virtual network IP range.

MachinePools

  • Feature status: Experimental
  • Feature gate: MachinePool=true

In Cluster API (CAPI) v1alpha2, users can create MachineDeployment, MachineSet or Machine custom resources. When you create a MachineDeployment or MachineSet, Cluster API components react and eventually Machine resources are created. Cluster API’s current architecture mandates that a Machine maps to a single machine (virtual or bare metal) with the provider being responsible for the management of the underlying machine’s infrastructure.

Nearly all infrastructure providers have a way for their users to manage a group of machines (virtual or bare metal) as a single entity. Each infrastructure provider offers their own unique features, but nearly all are concerned with managing availability, health, and configuration updates.

A MachinePool is similar to a MachineDeployment in that they both define configuration and policy for how a set of machines are managed. They Both define a common configuration, number of desired machine replicas, and policy for update. Both types also combine information from Kubernetes as well as the underlying provider infrastructure to give a view of the overall health of the machines in the set.

MachinePool diverges from MachineDeployment in that the MachineDeployment controller uses MachineSets to achieve the aforementioned desired number of machines and to orchestrate updates to the Machines in the managed set, while MachinePool delegates the responsibility of these concerns to an infrastructure provider specific resource such as AWS Auto Scale Groups, GCP Managed Instance Groups, and Azure Virtual Machine Scale Sets.

MachinePool is optional and doesn’t replace the need for MachineSet/Machine since not every infrastructure provider will have an abstraction for managing multiple machines (i.e. bare metal). Users may always opt to choose MachineSet/Machine when they don’t see additional value in MachinePool for their use case.

Source: MachinePool API Proposal

AzureMachinePool

Cluster API Provider Azure (CAPZ) has experimental support for MachinePool through the infrastructure type AzureMachinePool and AzureMachinePoolMachine. An AzureMachinePool corresponds to an Azure Virtual Machine Scale Set, which provides the cloud provider specific resource for orchestrating a group of Virtual Machines. The AzureMachinePoolMachine corresponds to a virtual machine instance within the Virtual Machine Scale Set.

Safe Rolling Upgrades and Delete Policy

AzureMachinePools provides the ability to safely deploy new versions of Kubernetes, or more generally, changes to the Virtual Machine Scale Set model, e.g., updating the OS image run by the virtual machines in the scale set. For example, if a cluster operator wanted to change the Kubernetes version of the MachinePool, they would update the Version field on the MachinePool, then AzureMachinePool would respond by rolling out the new OS image for the specified Kubernetes version to each of the virtual machines in the scale set progressively cordon, draining, then replacing the machine. This enables AzureMachinePools to upgrade the underlying pool of virtual machines with minimal interruption to the workloads running on them.

AzureMachinePools also provides the ability to specify the order of virtual machine deletion.

Describing the Deployment Strategy

Below we see a partially described AzureMachinePool. The strategy field describes the AzureMachinePoolDeploymentStrategy. At the time of writing this, there is only one strategy type, RollingUpdate, which provides the ability to specify delete policy, max surge, and max unavailable.

  • deletePolicy: provides three options for order of deletion Oldest, Newest, and Random
  • maxSurge: provides the ability to specify how many machines can be added in addition to the current replica count during an upgrade operation. This can be a percentage, or a fixed number.
  • maxUnavailable: provides the ability to specify how many machines can be unavailable at any time. This can be a percentage, or a fixed number.
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: capz-mp-0
spec:
  strategy:
    rollingUpdate:
      deletePolicy: Oldest
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate

AzureMachinePoolMachines

AzureMachinePoolMachine represents a virtual machine in the scale set. AzureMachinePoolMachines are created by the AzureMachinePool controller and are used to track the life cycle of a virtual machine in the scale set. When a AzureMachinePool is created, each virtual machine instance will be represented as a AzureMachinePoolMachine resource. A cluster operator can delete the AzureMachinePoolMachine resource if they would like to delete a specific virtual machine from the scale set. This is useful if one would like to manually control upgrades and rollouts through CAPZ.

Using clusterctl to deploy

To deploy a MachinePool / AzureMachinePool via clusterctl generate there’s a flavor for that.

Make sure to set up your Azure environment as described here.

clusterctl generate cluster my-cluster --kubernetes-version v1.21.2 --flavor machinepool > my-cluster.yaml

The template used for this flavor is located here.

Example MachinePool, AzureMachinePool and KubeadmConfig Resources

Below is an example of the resources needed to create a pool of Virtual Machines orchestrated with a Virtual Machine Scale Set.

---
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachinePool
metadata:
  name: capz-mp-0
spec:
  clusterName: capz
  replicas: 2
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
          kind: KubeadmConfig
          name: capz-mp-0
      clusterName: capz
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureMachinePool
        name: capz-mp-0
      version: v1.21.2
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: capz-mp-0
spec:
  location: westus2
  strategy:
    rollingUpdate:
      deletePolicy: Oldest
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: Premium_LRS
      osType: Linux
    sshPublicKey: ${YOUR_SSH_PUB_KEY}
    vmSize: Standard_D2s_v3
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
kind: KubeadmConfig
metadata:
  name: capz-mp-0
spec:
  files:
  - content: |
      {
        "cloud": "AzurePublicCloud",
        "tenantId": "tenantID",
        "subscriptionId": "subscriptionID",
        "aadClientId": "clientID",
        "aadClientSecret": "secret",
        "resourceGroup": "capz",
        "securityGroupName": "capz-node-nsg",
        "location": "westus2",
        "vmType": "vmss",
        "vnetName": "capz-vnet",
        "vnetResourceGroup": "capz",
        "subnetName": "capz-node-subnet",
        "routeTableName": "capz-node-routetable",
        "loadBalancerSku": "Standard",
        "maximumLoadBalancerRuleCount": 250,
        "useManagedIdentityExtension": false,
        "useInstanceMetadata": true
      }
    owner: root:root
    path: /etc/kubernetes/azure.json
    permissions: "0644"
  joinConfiguration:
    nodeRegistration:
      kubeletExtraArgs:
        cloud-config: /etc/kubernetes/azure.json
        cloud-provider: azure
      name: '{{ ds.meta_data["local_hostname"] }}'

Managed Clusters (AKS)

  • Feature status: Experimental
  • Feature gate: AKS=true,MachinePool=true

Cluster API Provider Azure (CAPZ) experimentally supports managing Azure Kubernetes Service (AKS) clusters. CAPZ implements this with three custom resources:

  • AzureManagedControlPlane
  • AzureManagedCluster
  • AzureManagedMachinePool

The combination of AzureManagedControlPlane/AzureManagedCluster corresponds to provisioning an AKS cluster. AzureManagedMachinePool corresponds one-to-one with AKS node pools. This also means that creating an AzureManagedControlPlane requires at least one AzureManagedMachinePool with spec.mode System, since AKS expects at least one system pool at creation time. For more documentation on system node pool refer AKS Docs

Deploy with clusterctl

A clusterctl flavor exists to deploy an AKS cluster with CAPZ. This flavor requires the following environment variables to be set before executing clusterctl.

# Kubernetes values
export CLUSTER_NAME="my-cluster"
export WORKER_MACHINE_COUNT=2
export KUBERNETES_VERSION="v1.19.6"

# Azure values
export AZURE_LOCATION="southcentralus"
export AZURE_RESOURCE_GROUP="${CLUSTER_NAME}"
# set AZURE_SUBSCRIPTION_ID to the GUID of your subscription
# this example uses an sdk authentication file and parses the subscriptionId with jq
# this file may be created using
#
# `az ad sp create-for-rbac --role Contributor --sdk-auth > sp.json`
#
# when logged in with a service principal, it's also available using
#
# `az account show --sdk-auth`
#
# Otherwise, you can set this value manually.
#
export AZURE_SUBSCRIPTION_ID="$(cat ~/sp.json | jq -r .subscriptionId | tr -d '\n')"
export AZURE_NODE_MACHINE_TYPE="Standard_D2s_v3"

Managed clusters also require the following feature flags set as environment variables:

export EXP_MACHINE_POOL=true
export EXP_AKS=true

Execute clusterctl to template the resources, then apply to a management cluster:

clusterctl init --infrastructure azure
clusterctl generate cluster ${CLUSTER_NAME} --kubernetes-version ${KUBERNETES_VERSION} --flavor aks > cluster.yaml

# assumes an existing management cluster
kubectl apply -f cluster.yaml

# check status of created resources
kubectl get cluster-api -o wide

Specification

We’ll walk through an example to view available options.

apiVersion: cluster.x-k8s.io/v1alpha4
kind: Cluster
metadata:
  name: my-cluster
spec:
  clusterNetwork:
    services:
      cidrBlocks:
      - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureManagedControlPlane
    name: my-cluster-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureManagedCluster
    name: my-cluster
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureManagedControlPlane
metadata:
  name: my-cluster-control-plane
spec:
  location: southcentralus
  resourceGroup: foo-bar
  sshPublicKey: ${AZURE_SSH_PUBLIC_KEY_B64:=""}
  subscriptionID: fae7cc14-bfba-4471-9435-f945b42a16dd # fake uuid
  version: v1.20.5
  networkPolicy: azure # or calico
  networkPlugin: azure # or kubenet
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureManagedCluster
metadata:
  name: my-cluster
spec:
  subscriptionID: fae7cc14-bfba-4471-9435-f945b42a16dd # fake uuid
---
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachinePool
metadata:
  name: agentpool0
spec:
  clusterName: my-cluster
  replicas: 2
  template:
    spec:
      clusterName: my-cluster
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureManagedMachinePool
        name: agentpool0
        namespace: default
      version: v1.20.5
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureManagedMachinePool
metadata:
  name: agentpool0
spec:
  mode: System
  osDiskSizeGB: 512
  sku: Standard_D2s_v3
---
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachinePool
metadata:
  name: agentpool1
spec:
  clusterName: my-cluster
  replicas: 2
  template:
    spec:
      clusterName: my-cluster
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureManagedMachinePool
        name: agentpool1
        namespace: default
      version: v1.20.5
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureManagedMachinePool
metadata:
  name: agentpool1
spec:
  mode: User
  osDiskSizeGB: 1024
  sku: Standard_D2s_v4

The main features for configuration today are networkPolicy and networkPlugin. Other configuration values like subscriptionId and node machine type should be fairly clear from context.

optionavailable values
networkPluginazure, kubenet
networkPolicyazure, calico

Multitenancy

Multitenancy for managed clusters can be configured by using aks-multi-tenancy flavor. The steps for creating an azure managed identity and mapping it to an AzureClusterIdentity are similar to the ones described here. The AzureClusterIdentity object is then mapped to a managed cluster through the identityRef field in AzureManagedControlPlane.spec. Following is an example configuration:

apiVersion: cluster.x-k8s.io/v1alpha4
kind: Cluster
metadata:
  name: ${CLUSTER_NAME}
  namespace: default
spec:
  clusterNetwork:
    services:
      cidrBlocks:
      - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: exp.infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureManagedControlPlane
    name: ${CLUSTER_NAME}
  infrastructureRef:
    apiVersion: exp.infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureManagedCluster
    name: ${CLUSTER_NAME}
---
apiVersion: exp.infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureManagedControlPlane
metadata:
  name: ${CLUSTER_NAME}
  namespace: default
spec:
  defaultPoolRef:
    name: agentpool0
  identityRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureClusterIdentity
    name: ${CLUSTER_IDENTITY_NAME}
    namespace: ${CLUSTER_IDENTITY_NAMESPACE}
  location: ${AZURE_LOCATION}
  resourceGroupName: ${AZURE_RESOURCE_GROUP:=${CLUSTER_NAME}}
  sshPublicKey: ${AZURE_SSH_PUBLIC_KEY_B64:=""}
  subscriptionID: ${AZURE_SUBSCRIPTION_ID}
  version: ${KUBERNETES_VERSION}
---

Features

AKS clusters deployed from CAPZ currently only support a limited, “blessed” configuration. This was primarily to keep the initial implementation simple. If you’d like to run managed AKS cluster with CAPZ and need an additional feature, please open a pull request or issue with details. We’re happy to help!

Current limitations

  • DNS IP is hardcoded to the x.x.x.10 inside the service CIDR.
    • primarily due to lack of validation, see https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/612
  • Only supports system managed identities.
    • We would like to support user managed identities where appropriate.
  • Only supports Standard load balancer (SLB).
    • We will not support Basic load balancer in CAPZ. SLB is generally the path forward in Azure.

Troubleshooting

If a user tries to delete the MachinePool which refers to the last system node pool AzureManagedMachinePool webhook will reject deletion, so time stamp never gets set on the AzureManagedMachinePool. However the timestamp would be set on the MachinePool and would be in deletion state. To recover from this state create a new MachinePool manually referencing the AzureManagedMachinePool, edit the required references and finalizers to link the MachinePool to the AzureManagedMachinePool. In the AzureManagedMachinePool remove the owner reference to the old MachinePool, and set it to the new MachinePool. Once the new MachinePool is pointing to the AzureManagedMachinePool you can delete the old MachinePool. To delete the old MachinePool remove the finalizers in that object.

Here is an Example:

# MachinePool deleted 
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachinePool
metadata:
  finalizers:             # remove finalizers once new object is pointing to the AzureManagedMachinePool
  - machinepool.cluster.x-k8s.io
  labels:
    cluster.x-k8s.io/cluster-name: capz-managed-aks
  name: agentpool0
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha4
    kind: Cluster
    name: capz-managed-aks
    uid: 152ecf45-0a02-4635-987c-1ebb89055fa2
  uid: ae4a235a-f0fa-4252-928a-0e3b4c61dbea
spec:
  clusterName: capz-managed-aks
  minReadySeconds: 0
  providerIDList:
  - azure:///subscriptions/9107f2fb-e486-a434-a948-52e2929b6f18/resourceGroups/MC_rg_capz-managed-aks_eastus/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool0-10226072-vmss/virtualMachines/0
  replicas: 1
  template:
    metadata: {}
    spec:
      bootstrap:
        dataSecretName: ""
      clusterName: capz-managed-aks
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureManagedMachinePool
        name: agentpool0
        namespace: default
      version: v1.20.5

---
# New Machinepool
apiVersion: cluster.x-k8s.io/v1alpha4
kind: MachinePool
metadata:
  finalizers:
  - machinepool.cluster.x-k8s.io
  generation: 2
  labels:
    cluster.x-k8s.io/cluster-name: capz-managed-aks
  name: agentpool2    # change the name of the machinepool
  namespace: default 
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha4
    kind: Cluster
    name: capz-managed-aks
    uid: 152ecf45-0a02-4635-987c-1ebb89055fa2   
  # uid: ae4a235a-f0fa-4252-928a-0e3b4c61dbea     # remove the uid set for machinepool
spec:
  clusterName: capz-managed-aks
  minReadySeconds: 0
  providerIDList:
  - azure:///subscriptions/9107f2fb-e486-a434-a948-52e2929b6f18/resourceGroups/MC_rg_capz-managed-aks_eastus/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool0-10226072-vmss/virtualMachines/0  
  replicas: 1
  template:
    metadata: {}
    spec:
      bootstrap:
        dataSecretName: ""
      clusterName: capz-managed-aks
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
        kind: AzureManagedMachinePool
        name: agentpool0
        namespace: default
      version: v1.20.5

Multi-tenancy

To enable single controller multi-tenancy, a different Identity can be added to the Azure Cluster that will be used as the Azure Identity when creating Azure resources related to that cluster.

This is achieved using the aad-pod-identity library.

Service Principal Identity

Once a new SP Identity is created in Azure, the corresponding values should be used to create an AzureClusterIdentity resource:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureClusterIdentity
metadata:
  name: example-identity
  namespace: default
spec:
  type: ServicePrincipal
  tenantID: <azure-tenant-id>
  clientID: <client-id-of-SP-identity>
  clientSecret: {"name":"<secret-name-for-client-password>","namespace":"default"}
  allowedNamespaces: 
    list:
    - <cluster-namespace>

The password will need to be added in a secret similar to the following example:

apiVersion: v1
kind: Secret
metadata:
  name: <secret-name-for-client-password>
type: Opaque
data:
  clientSecret: <client-secret-of-SP-identity>

OR the password can also be added as a Certificate:

apiVersion: v1
kind: Secret
metadata:
  name: <secret-name-for-client-password>
type: Opaque
data:
  certificate: CERTIFICATE
  password: PASSWORD

allowedNamespaces

AllowedNamespaces is used to identify the namespaces the clusters are allowed to use the identity from. Namespaces can be selected either using an array of namespaces or with label selector. An empty allowedNamespaces object indicates that AzureClusters can use this identity from any namespace. If this object is nil, no namespaces will be allowed (default behaviour, if this field is not provided) A namespace should be either in the NamespaceList or match with Selector to use the identity. Please note NamespaceList will take precedence over Selector if both are set.

IdentityRef in AzureCluster

The Identity can be added to an AzureCluster by using IdentityRef field:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: example-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    vnet:
      name: example-cluster-vnet
  resourceGroup: example-cluster
  subscriptionID: <AZURE_SUBSCRIPTION_ID>
  identityRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureClusterIdentity
    name: <name-of-identity>
    namespace: <namespace-of-identity>

For more details on how aad-pod-identity works, please check the guide here.

User Assigned Identity

will be supported in a future release

Node Outbound

This document describes how to configure your clusters’ node outbound traffic.

Node Outbound Load Balancer

Public Clusters

For public clusters ie. clusters with api server load balancer type set to Public, CAPZ automatically configures a node outbound load balancer with the default settings.

To provide custom settings for the node outbound load balancer, use the nodeOutboundLB section in cluster configuration.

The idleTimeoutInMinutes specifies the number of minutes to keep a TCP connection open for the outbound rule (defaults to 4). See here for more details.

Here is an example of a node outbound load balancer with frontendIPsCount set to 3. CAPZ will read this value and create 3 front end ips for this load balancer.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: my-public-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    apiServerLB:
      type: Public
    nodeOutboundLB:
      frontendIPsCount: 3
      idleTimeoutInMinutes: 4

Private Clusters

For private clusters ie. clusters with api server load balancer type set to Internal, CAPZ does not create a node outbound load balancer by default. To create a node outbound load balancer, include the nodeOutboundLB section with the desired settings.

Here is an example of configuring a node outbound load balancer with 1 front end ip for a private cluster:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: my-private-cluster
  namespace: default
spec:
  location: eastus
  networkSpec:
    apiServerLB:
      type: Internal
    nodeOutboundLB:
      frontendIPsCount: 1

Node Outbound Nat Gateway

You can configure a Nat Gateway in a subnet to enable outbound traffic in the cluster nodes by setting the Nat Gateway’s name in the subnet configuration. A Public IP will also be created for the Nat Gateway.

Using this configuration, a Load Balancer for the nodes outbound traffic won’t be created.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: cluster-natgw
  namespace: default
spec:
  location: southcentralus
  networkSpec:
    vnet:
      name: my-vnet
    subnets:
      - name: subnet-cp
        role: control-plane
      - name: subnet-node
        role: node
        natGateway:
          name: node-natgw
          NatGatewayIP:
            name: pip-cluster-natgw-subnet-node-natgw
  resourceGroup: cluster-natgw

You can also define the Public IP name that should be used when creating the Public IP for the Nat Gateway. If you don’t specify it, CAPZ will automatically generate a name for it.

Spot Virtual Machines

Azure Spot Virtual Machines allow users to reduce the costs of their compute resources by utilising Azure’s spare capacity for a lower price.

With this lower cost, comes the risk of preemption. When capacity within a particular Availability Zone is increased, Azure may need to reclaim Spot Virtual Machines to satisfy the demand on their data centres.

When should I use Spot Virtual Machines?

Spot Virtual Machines are ideal for workloads that can be interrupted. For example, short jobs or stateless services that can be rescheduled quickly, without data loss, and resume operation with limited degradation to a service.

How do I use Spot Virtual Machines?

To enable a Machine to be backed by a Spot Virtual Machine, add spotVMOptions to your AzureMachineTemplate:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: capz-md-0
spec:
  location: westus2
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: Premium_LRS
      osType: Linux
    sshPublicKey: ${YOUR_SSH_PUB_KEY}
    vmSize: Standard_D2s_v3
    spotVMOptions: {}

You may also add a maxPrice to the options to limit the maximum spend for the instance. It is however, recommended not to set a maxPrice as Azure will cap your spending at the on-demand price if this field is left empty and you will experience fewer interruptions.

spec:
  template:
    spotVMOptions:
      maxPrice: 0.04 # Price in USD per hour (up to 5 decimal places)

The experimental MachinePool also supports using spot instances. To enable a MachinePool to be backed by spot instances, add spotVMOptions to your AzureMachinePool spec:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: capz-mp-0
spec:
  location: westus2
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: Premium_LRS
      osType: Linux
    sshPublicKey: ${YOUR_SSH_PUB_KEY}
    vmSize: Standard_D2s_v3
    spotVMOptions: {}

Custom Virtual Networks

Pre-existing vnet and subnets

To deploy a cluster using a pre-existing vnet, modify the AzureCluster spec to include the name and resource group of the existing vnet as follows, as well as the control plane and node subnets as follows:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: cluster-byo-vnet
  namespace: default
spec:
  location: southcentralus
  networkSpec:
    vnet:
      resourceGroup: custom-vnet
      name: my-vnet
    subnets:
      - name: control-plane-subnet
        role: control-plane
      - name: node-subnet
        role: node
  resourceGroup: cluster-byo-vnet

When providing a vnet, it is required to also provide the two subnets that should be used for control planes and nodes.

If providing an existing vnet and subnets with existing network security groups, make sure that the control plane security group allows inbound to port 6443, as port 6443 is used by kubeadm to bootstrap the control planes. Alternatively, you can provide a custom control plane endpoint in the KubeadmConfig spec.

The pre-existing vnet can be in the same resource group or a different resource group in the same subscription as the target cluster. When deleting the AzureCluster, the vnet and resource group will only be deleted if they are “managed” by capz, ie. they were created during cluster deployment. Pre-existing vnets and resource groups will not be deleted.

Custom Network Spec

It is also possible to customize the vnet to be created without providing an already existing vnet. To do so, simply modify the AzureCluster NetworkSpec as desired. Here is an illustrative example of a cluster with a customized vnet address space (CIDR) and customized subnets:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: cluster-example
  namespace: default
spec:
  location: southcentralus
  networkSpec:
    vnet:
      name: my-vnet
      cidrBlocks: 
        - 10.0.0.0/16
    subnets:
      - name: my-subnet-cp
        role: control-plane
        cidrBlocks: 
          - 10.0.1.0/24
      - name: my-subnet-node
        role: node
        cidrBlocks: 
          - 10.0.2.0/24
  resourceGroup: cluster-example

If no CIDR block is provided, 10.0.0.0/8 will be used by default, with default internal LB private IP 10.0.0.100.

Whenever using custom vnet and subnet names and/or a different vnet resource group, please make sure to update the azure.json content part of both the nodes and control planes’ kubeadmConfigSpec accordingly before creating the cluster.

Custom Security Rules

Security rules can also be customized as part of the subnet specification in a custom network spec. Note that ingress rules for the Kubernetes API Server port (default 6443) and SSH (22) are automatically added to the controlplane subnet only if security rules aren’t specified. It is the responsibility of the user to supply those rules themselves if using custom rules.

Here is an illustrative example of customizing rules that builds on the one above by adding an egress rule to the control plane nodes:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: cluster-example
  namespace: default
spec:
  location: southcentralus
  networkSpec:
    vnet:
      name: my-vnet
      cidrBlocks: 
        - 10.0.0.0/16
    subnets:
      - name: my-subnet-cp
        role: control-plane
        cidrBlocks: 
          - 10.0.1.0/24
        securityGroup:
          name: my-subnet-cp-nsg
          securityRules:
            - name: "allow_ssh"
              description: "allow SSH"
              direction: "Inbound"
              priority: 2200
              protocol: "*"
              destination: "*"
              destinationPorts: "22"
              source: "*"
              sourcePorts: "*"
            - name: "allow_apiserver"
              description: "Allow K8s API Server"
              direction: "Inbound"
              priority: 2201
              protocol: "*"
              destination: "*"
              destinationPorts: "6443"
              source: "*"
              sourcePorts: "*"
            - name: "allow_port_50000"
              description: "allow port 50000"
              direction: "Outbound"
              priority: 2202
              protocol: "Tcp"
              destination: "*"
              destinationPorts: "50000"
              source: "*"
              sourcePorts: "*"
      - name: my-subnet-node
        role: node
        cidrBlocks: 
          - 10.0.2.0/24
  resourceGroup: cluster-example

Custom subnets

Sometimes it’s desirable to use different subnets for different node pools. Several subnets can be specified in the networkSpec to be later referenced by name from other CR’s like AzureMachine or AzureMachinePool. When more than one node subnet is specified, the subnetName field in those other CR’s becomes mandatory because the controllers wouldn’t know which subnet to use.

The subnet used for the control plane must use the role control-plane while the subnets for the worker nodes must use the role node.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: cluster-example
  namespace: default
spec:
  location: southcentralus
  networkSpec:
    subnets:
    - name: control-plane-subnet
      role: control-plane
    - name: subnet-mp-1
      role: node
    - name: subnet-mp-2
      role: node
    vnet:
      name: my-vnet
      cidrBlocks:
        - 10.0.0.0/16
  resourceGroup: cluster-example
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: mp1
  namespace: default
spec:
  location: southcentralus
  strategy:
    rollingUpdate:
      deletePolicy: Oldest
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: Premium_LRS
      osType: Linux
    sshPublicKey: ${YOUR_SSH_PUB_KEY}
    subnetName: subnet-mp-1
    vmSize: Standard_D2s_v3
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: mp2
  namespace: default
spec:
  location: southcentralus
  strategy:
    rollingUpdate:
      deletePolicy: Oldest
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    osDisk:
      diskSizeGB: 30
      managedDisk:
        storageAccountType: Premium_LRS
      osType: Linux
    sshPublicKey: ${YOUR_SSH_PUB_KEY}
    subnetName: subnet-mp-2
    vmSize: Standard_D2s_v3

If you don’t specify any node subnets, one subnet with role node will be created and added to the networkSpec definition.

VM Identity

This document describes the available identities that be configured on the Azure host. For example, this is what grants permissions to the Azure Cloud Provider to provision LB services in Azure on the control plane nodes.

Flavors of Identities in Azure

All identities used in Azure are owned by Azure Active Directory (AAD). An identity, or principal, in AAD will provide the basis for each of the flavors of identities we will describe.

Managed Identities

Managed identity is a feature of Azure Active Directory (AAD) and Azure Resource Manager (ARM), which assigns ARM Role Base Access Control (RBAC) rights to AAD identities for use in Azure resources, like Virtual Machines. Each of the Azure services that support managed identities for Azure resources are subject to their own timeline. Make sure you review the availability status of managed identities for your resource and known issues before you begin.

Managed identity is used to create nodes which have an AAD identity provisioned onto the node by Azure Resource Manager (the Azure control plane) rather than providing credentials in the azure.json file. Managed identities are the preferred way to provide RBAC rights for a given resource in Azure as the lifespan of the identity is linked to the lifespan of the resource.

User-assigned managed identity (recommended)

A standalone Azure resource that is created by the user outside of the scope of this provider. The identity can be assigned to one or more Azure Machines. The lifecycle of a user-assigned identity is managed separately from the lifecycle of the Azure Machines to which it’s assigned.

This lifecycle allows you to separate your resource creation and identity administration responsibilities. User-assigned identities and their role assignments can be configured in advance of the resources that require them. Users who create the resources only require the access to assign a user-assigned identity, without the need to create new identities or role assignments.

Full details on how to create and manage user assigned identities using Azure CLI can be found in the Azure docs.

System-assigned managed identity

A system-assigned identity is a managed identity which is tied to the lifespan of a resource in Azure. The identity is created by Azure in AAD for the resource it is applied upon and reaped when the resource is deleted. Unlike a service principal, a system assigned identity is available on the local resource through a local port service via the instance metadata service.

⚠️ When a Node is created with a System Assigned Identity, A role of Subscription contributor is added to this generated Identity

How to use managed identity

User-assigned

  • In Machines
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: default
spec:
  template:
    spec:
      identity: UserAssigned
      userAssignedIdentities:
      - providerID: ${USER_ASSIGNED_IDENTITY_PROVIDER_ID}
      ...

The CAPZ controller will look for UserAssigned value in identity field under AzureMachineTemplate, and assign the user identities listed in userAssignedIdentities to the virtual machine.

  • In Machine Pool
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: ${CLUSTER_NAME}-mp-0
  namespace: default
spec:
  identity: UserAssigned
  userAssignedIdentities:
  - providerID: ${USER_ASSIGNED_IDENTITY_PROVIDER_ID}
  ...

The CAPZ controller will look for UserAssigned value in identity field under AzureMachinePool, and assign the user identities listed in userAssignedIdentities to the virtual machine scale set.

Alternatively, you can use the user-assigned-identity, and machinepool-user-assigned-identity flavors by setting the {flavor} in clusterctl generate cluster --flavor {flavor} to use user-assigned managed identity in machine deployment, and machine pool respectively.

System-assigned

  • In Machines
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachineTemplate
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: default
spec:
  template:
    spec:
      identity: SystemAssigned
      ...

The CAPZ controller will look for SystemAssigned value in identity field under AzureMachineTemplate, and enable system-assigned managed identity in the virtual machine.

  • In Machine Pool
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureMachinePool
metadata:
  name: ${CLUSTER_NAME}-mp-0
  namespace: default
spec:
  identity: SystemAssigned
  ...

The CAPZ controller will look for SystemAssigned value in identity field under AzureMachinePool, and enable system-assigned managed identity in the virtual machine scale set.

Alternatively, you can also use the system-assigned-identity, and machinepool-system-assigned-identity flavors by setting the {flavor} in clusterctl generate cluster --flavor {flavor} to use system-assigned managed identity in machine deployment, and machine pool respectively.

Service Principal (not recommended)

A service principal is an identity in AAD which is described by a tenant ID and client (or “app”) ID. It can have one or more associated secrets or certificates. The set of these values will enable the holder to exchange the values for a JWT token to communicate with Azure. The user generally creates a service principal, saves the credentials, and then uses the credentials in applications. To read more about Service Principals and AD Applications see “Application and service principal objects in Azure Active Directory”.

To use a client id/secret for authentication for Cloud Provider, simply leave the identity empty, or set it to None. The autogenerated cloud provider config secret will contain the client id and secret used in your AzureClusterIdentity for AzureCluster creation as aadClientID and aadClientSecret.

To use a certificate/password for authentication, you will need to write the certificate file on the VM (for example using the files option if using CABPK/cloud-init) and mount it to the cloud-controller-manager, then refer to it as aadClientCertPath, along with aadClientCertPassword, in your cloud provider config. Please consider using a user-assigned identity instead before going down that route as they are more secure and flexible, as described above.

Creating a Service Principal

  • With the Azure CLI

    • Subscription level Scope

      az login
      az account set --subscription="${AZURE_SUBSCRIPTION_ID}"
      az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/${AZURE_SUBSCRIPTION_ID}"
      
    • Resource group level scope

      az login
      az account set --subscription="${AZURE_SUBSCRIPTION_ID}"
      az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_RESOURCE_GROUP}"
      

    This will output your appId, password, name, and tenant. The name or appId is used for the AZURE_CLIENT_ID and the password is used for AZURE_CLIENT_SECRET.

    Confirm your service principal by opening a new shell and run the following commands substituting in name, password, and tenant:

    az login --service-principal -u NAME -p PASSWORD --tenant TENANT
    az vm list-sizes --location eastus
    

Windows clusters

Overview

CAPZ enables you to create Windows Kubernetes clusters on Microsoft Azure.

To deploy a cluster using Windows, use the Windows flavor template.

Deploy a workload

After you Windows VM is up and running you can deploy a workload. Using the deployment file below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: iis-1809
  labels:
    app: iis-1809
spec:
  replicas: 1
  template:
    metadata:
      name: iis-1809
      labels:
        app: iis-1809
    spec:
      containers:
      - name: iis
        image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
        resources:
          limits:
            cpu: 1
            memory: 800m
          requests:
            cpu: .1
            memory: 300m
        ports:
          - containerPort: 80
      nodeSelector:
        "kubernetes.io/os": windows
  selector:
    matchLabels:
      app: iis-1809
---
apiVersion: v1
kind: Service
metadata:
  name: iis
spec:
  type: LoadBalancer
  ports:
  - protocol: TCP
    port: 80
  selector:
    app: iis-1809

Save this file to iis.yaml then deploy it:

kubectl apply -f .\iis.yaml

Get the Service endpoint and curl the website:

kubectl get services
NAME         TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
iis          LoadBalancer   10.0.9.47    <pending>     80:31240/TCP   1m
kubernetes   ClusterIP      10.0.0.1     <none>        443/TCP        46m


curl <EXTERNAL-IP>

Details

See the CAPI proposal for implementation details: https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20200804-windows-support.md

VM and VMSS naming

Azure does not support creating Windows VM’s with names longer than 15 characters (see additional details historical restrictions).

When creating a cluster with AzureMachine if the AzureMachine is longer than 15 characters then the first 9 characters of the cluster name and appends the last 5 characters of the machine to create a unique machine name.

When creating a cluster with Machinepool if the Machine Pool name is longer than 9 characters then the Machine pool uses the prefix win and appends the last 5 characters of the machine pool name.

VM password and access

The VM password is random generated by Cloudbase-init during provisioning of the VM. For Access to the VM you can use ssh which will be configured with SSH public key you provided during deployment.

To SSH:

ssh -t -i .sshkey -o 'ProxyCommand ssh -i .sshkey -W %h:%p capi@<api-server-ip>' capi@<windows-ip> powershell.exe

There is also a CAPZ kubectl plugin that automates the ssh connection using the Management cluster

To RDP you can proxy through the api server:

ssh -L 5555:10.1.0.4:3389 capi@20.69.66.232

And then open an RDP client on your local machine to localhost:5555

Image creation

The images are built using image-builder and published the the Azure Market place. They use Cloudbase-init to bootstrap the machines via Kubeadm.

Find the latest published images:

az vm image list --publisher cncf-upstream --offer capi-windows -o table --all  
Offer         Publisher      Sku                           Urn                                                                 Version
------------  -------------  ----------------------------  ------------------------------------------------------------------  ----------
capi-windows  cncf-upstream  k8s-1dot18dot13-windows-2019  cncf-upstream:capi-windows:k8s-1dot18dot13-windows-2019:2020.12.11  2020.12.11
capi-windows  cncf-upstream  k8s-1dot19dot5-windows-2019   cncf-upstream:capi-windows:k8s-1dot19dot5-windows-2019:2020.12.11   2020.12.11
capi-windows  cncf-upstream  k8s-1dot20dot0-windows-2019   cncf-upstream:capi-windows:k8s-1dot20dot0-windows-2019:2020.12.11   2020.12.11

If you would like customize your images please refer to the documentation on building your own custom images.

Kube-proxy and CNIs

Kube-proxy and Windows CNIs are deployed via Cluster Resource Sets. Windows does not have a kube-proxy image due to not having Privileged containers which would provide access to the host. The current solution is using wins.exe as demonstrated in the Kubeadm support for Windows.

Windows HostProcess Container support is in KEP form with plans to implement in 1.22. Kube-proxy and other CNI’s will then be replaced with the HostProcess containers.

Flannel is being used as the default CNI. An important note for Flannel vxlan deployments is that the MTU for the linux nodes must be set to 1400.
This is because Azure’s VNET MTU is 1400 which can cause fragmentation on packets sent from the Linux node to Windows node resulting in dropped packets. To mitigate this we set the Linux eth0 port match 1400 and Flannel will automatically pick this up and subtract 50 for the flannel network created.

SSH access to nodes

This document describes how to get SSH access to virtual machines being part of a CAPZ cluster.

In order to get SSH access to a Virtual Machine on Azure, two requirements have to be met:

  • get network-level access to the SSH service;
  • get authentication sorted.

This documents describe some possible strategies to fulfil both requirements.

Network Access

Default behavior

By default, control plane VMs have SSH access allowed from any source in their Network Security Groups. Also by default, VMs don’t have a public IP address assigned.

To get SSH access to one of the Control Plane VMs you can use the API Load Balancer‘s IP, because by default an Inbound NAT Rule is created to route traffic coming to the load balancer on TCP port 22 (the SSH port) to one of the nodes with role master in the workload cluster.

This of course works only for clusters that are using a Public Load Balancer.

In order to reach all other VMs, you can use the NATted control plane VM as a bastion host and use the private IP address for the other nodes.

For example, let’s consider this CAPZ cluster (using a Public Load Balancer) with two nodes:

NAME                        STATUS   ROLES    AGE    VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
test1-control-plane-cn9lm   Ready    master   111m   v1.18.16   10.0.0.4      <none>        Ubuntu 18.04.5 LTS   5.4.0-1039-azure   containerd://1.4.3
test1-md-0-scctm            Ready    <none>   109m   v1.18.16   10.1.0.4      <none>        Ubuntu 18.04.5 LTS   5.4.0-1039-azure   containerd://1.4.3

You can SSH to the control plane node using the load balancer’s public DNS name:

$ kubectl get azurecluster test1 -o json| jq .spec.networkSpec.apiServerLB.frontendIPs[0].publicIP.dnsName
test1-21192f78.eastus.cloudapp.azure.com

$ ssh username@test1-21192f78.eastus.cloudapp.azure.com hostname
test1-control-plane-cn9lm

As you can see, the Load Balancer routed the request to node test1-control-plane-cn9lm that is the only node with role master in this workload cluster.

In order to SSH to node ‘test1-md-0-scctm’, you can use the other node as a bastion:

$ ssh -J username@test1-21192f78.eastus.cloudapp.azure.com username@10.1.0.4 hostname
test1-md-0-scctm

Clusters using an Internal Load Balancer (private clusters) can’t use this approach. Network-level SSH access to those clusters have to be made on the private IP address of VMs by first getting access to the Virtual Network. How to do that is out of the scope of this document. A possible alternative that works for private clusters as well is described in the next paragraph.

Azure Bastion

A possible alternative to the process described above is to use the Azure Bastion feature. This approach works the same way for workload clusters using either type of Load Balancers.

In order to enable Azure Bastion on a CAPZ workload cluster, edit the AzureCluster CR and set the spec/bastionSpec/azureBastion field. It is enough to set the field’s value to the empty object {} and the default configuration settings will be used while deploying the Azure Bastion.

For example this is an AzureCluster CR with the Azure Bastion feature enabled:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: test1
  namespace: default
spec:
  bastionSpec:
    azureBastion: {}
  ...

Once the Azure Bastion is deployed, it will be possible to SSH to any of the cluster VMs through the Azure Portal. Please follow the official documentation for a deeper explanation on how to do that.

Advanced settings

When the AzureBastion feature is enabled in a CAPZ cluster, 3 new resources will be deployed in the resource group:

  • The Azure Bastion resource;
  • A subnet named AzureBastionSubnet (the name is mandatory and can’t be changed);
  • A public IP address.

The default values for the new resources should work for most use cases, but if you need to customize them you can provide your own values. Here is a detailed example:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
kind: AzureCluster
metadata:
  name: test1
  namespace: default
spec:
  bastionSpec:
    azureBastion:
      name: "..." // The name of the Azure Bastion, defaults to '<cluster name>-azure-bastion'
      subnet:
        name: "..." // The name of the Subnet. The only supported name is `AzureBastionSubnet` (this is an Azure limitation).
        securityGroup: {} // No security group is assigned by default. You can choose to have one created and assigned by defining it. 
      publicIP:
        "name": "..." // The name of the Public IP, defaults to '<cluster name>-azure-bastion-pip'.

If you specify a security group to be associated with the Azure Bastion subnet, it needs to have some networking rules defined or the Azure Bastion resource creation will fail. Please refer to the documentation for more details.

Authentication

With the networking part sorted, we still have to work out a way of authenticating to the VMs via SSH.

Provisioning SSH keys using Machine Templates

In order to add an SSH authorized key for user username and provide sudo access to the control plane VMs, you can adjust the KubeadmControlPlane CR as in the following example:

apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
...
spec:
  ...
  kubeadmConfigSpec:
    ...
    users:
    - name: username
      sshAuthorizedKeys:
      - "ssh-rsa AAAA..."
    files:
    - content: "username ALL = (ALL) NOPASSWD: ALL"
      owner: root:root
      path: /etc/sudoers.d/username
      permissions: "0440"
    ...

Similarly, you can achieve the same result for Machine Deployments by customizing the KubeadmConfigTemplate CR:

apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
metadata:
  name: test1-md-0
  namespace: default
spec:
  template:
    spec:
      files:
      ...
      - content: "username ALL = (ALL) NOPASSWD: ALL"
        owner: root:root
        path: /etc/sudoers.d/username
        permissions: "0440"
      ...
      users:
      - name: username
        sshAuthorizedKeys:
        - "ssh-rsa AAAA..."

Setting SSH keys or passwords using the Azure Portal

An alternative way of gaining SSH access to VMs on Azure is to set the password or authorized key via the Azure Portal. In the Portal, navigate to the Virtual Machine details page and find the Reset password function in the left pane.

Developing Cluster API Provider Azure

Contents

Setting up

Base requirements

  1. Install go
    • Get the latest patch version for go v1.16.
  2. Install jq
    • brew install jq on macOS.
    • chocolatey install jq on Windows.
    • sudo apt install jq on Ubuntu Linux.
  3. Install gettext package
    • brew install gettext && brew link --force gettext on macOS.
    • install instructions on Windows.
    • sudo apt install gettext on Ubuntu Linux.
  4. Install KIND
    • GO111MODULE="on" go get sigs.k8s.io/kind@v0.9.0.
  5. Install Kustomize
    • brew install kustomize on macOS.
    • choco install kustomize on Windows.
    • install instructions on Linux
  6. Install Python 3.x or 2.7.x, if neither is already installed.
  7. Install make.
  8. Install timeout
    • brew install coreutils on macOS.

Get the source

go get -d sigs.k8s.io/cluster-api-provider-azure
cd "$(go env GOPATH)/src/sigs.k8s.io/cluster-api-provider-azure"

Get familiar with basic concepts

This provider is modeled after the upstream Cluster API project. To get familiar with Cluster API resources, concepts and conventions (such as CAPI and CAPZ), refer to the Cluster API Book.

Dev manifest files

Part of running cluster-api-provider-azure is generating manifests to run. Generating dev manifests allows you to test dev images instead of the default releases.

Dev images

Container registry

Any public container registry can be leveraged for storing cluster-api-provider-azure container images.

Developing

Change some code!

Modules and dependencies

This repositories uses Go Modules to track and vendor dependencies.

To pin a new dependency:

  • Run go get <repository>@<version>.
  • (Optional) Add a replace statement in go.mod.

Makefile targets and scripts are offered to work with go modules:

  • make verify-modules checks whether go module files are out of date.
  • make modules runs go mod tidy to ensure proper vendoring.
  • hack/ensure-go.sh checks that the Go version and environment variables are properly set.

Setting up the environment

Your environment must have the Azure credentials as outlined in the getting started prerequisites section.

Using Tilt

Both of the Tilt setups below will get you started developing CAPZ in a local kind cluster. The main difference is the number of components you will build from source and the scope of the changes you’d like to make. If you only want to make changes in CAPZ, then follow CAPZ instructions. This will save you from having to build all of the images for CAPI, which can take a while. If the scope of your development will span both CAPZ and CAPI, then follow the CAPI and CAPZ instructions.

Tilt for dev in CAPZ

If you want to develop in CAPZ and get a local development cluster working quickly, this is the path for you.

From the root of the CAPZ repository and after configuring the environment variables, you can run the following to generate your tilt-settings.json file:

cat <<EOF > tilt-settings.json
{
  "kustomize_substitutions": {
      "AZURE_SUBSCRIPTION_ID_B64": "$(echo "${AZURE_SUBSCRIPTION_ID}" | tr -d '\n' | base64 | tr -d '\n')",
      "AZURE_TENANT_ID_B64": "$(echo "${AZURE_TENANT_ID}" | tr -d '\n' | base64 | tr -d '\n')",
      "AZURE_CLIENT_SECRET_B64": "$(echo "${AZURE_CLIENT_SECRET}" | tr -d '\n' | base64 | tr -d '\n')",
      "AZURE_CLIENT_ID_B64": "$(echo "${AZURE_CLIENT_ID}" | tr -d '\n' | base64 | tr -d '\n')",
  }
}
EOF

To build a kind cluster and start Tilt, just run:

make tilt-up

By default, the Cluster API components deployed by Tilt have experimental features turned off. If you would like to enable these features, add extra_args as specified in The Cluster API Book.

Once your kind management cluster is up and running, you can deploy a workload cluster.

You can also deploy a flavor cluster as a local tilt resource.

To tear down the kind cluster built by the command above, just run:

make kind-reset

Tilt for dev in both CAPZ and CAPI

If you want to develop in both CAPI and CAPZ at the same time, then this is the path for you.

To use Tilt for a simplified development workflow, follow the instructions in the cluster-api repo. The instructions will walk you through cloning the Cluster API (CAPI) repository and configuring Tilt to use kind to deploy the cluster api management components.

you may wish to checkout out the correct version of CAPI to match the version used in CAPZ

Note that tilt up will be run from the cluster-api repository directory and the tilt-settings.json file will point back to the cluster-api-provider-azure repository directory. Any changes you make to the source code in cluster-api or cluster-api-provider-azure repositories will automatically redeployed to the kind cluster.

After you have cloned both repositories, your folder structure should look like:

|-- src/cluster-api-provider-azure
|-- src/cluster-api (run `tilt up` here)

After configuring the environment variables, run the following to generate your tilt-settings.json file:

cat <<EOF > tilt-settings.json
{
  "default_registry": "${REGISTRY}",
  "provider_repos": ["../cluster-api-provider-azure"],
  "enable_providers": ["azure", "docker", "kubeadm-bootstrap", "kubeadm-control-plane"],
  "kustomize_substitutions": {
      "AZURE_SUBSCRIPTION_ID_B64": "$(echo "${AZURE_SUBSCRIPTION_ID}" | tr -d '\n' | base64 | tr -d '\n')",
      "AZURE_TENANT_ID_B64": "$(echo "${AZURE_TENANT_ID}" | tr -d '\n' | base64 | tr -d '\n')",
      "AZURE_CLIENT_SECRET_B64": "$(echo "${AZURE_CLIENT_SECRET}" | tr -d '\n' | base64 | tr -d '\n')",
      "AZURE_CLIENT_ID_B64": "$(echo "${AZURE_CLIENT_ID}" | tr -d '\n' | base64 | tr -d '\n')",
  }
}
EOF

$REGISTRY should be in the format docker.io/<dockerhub-username>

The cluster-api management components that are deployed are configured at the /config folder of each repository respectively. Making changes to those files will trigger a redeploy of the management cluster components.

Deploying a workload cluster

After your kind management cluster is up and running with Tilt, you can configure workload cluster settings and deploy a workload cluster with the following:

make create-workload-cluster

To delete the cluster:

make delete-workload-cluster

Check out the troubleshooting guide for common errors you might run into.

Viewing Telemetry

The CAPZ controller emits tracing and metrics data. When run in Tilt, the KinD cluster is provisioned with a development deployment of OpenTelemetry for distributed tracing, and Prometheus for metrics scraping and visualization.

The OpenTelemetry and Prometheus deployments are for development purposes only. These illustrate the hooks for tracing and metrics, but lack the robustness of production cluster deployments.

After the Tilt cluster has been initialized, if you followed the tracing documentation, you can view distributed traces in the Azure Portal. Open the App Insights resource identified by the AZURE_INSTRUMENTATION_KEY you specified, choose “Transaction search” from the “Investigate” menu on the left, and click “Refresh” or make a specific query of the trace data.

To view metrics, run kubectl port-forward -n capz-system prometheus-prometheus-0 9090 and open http://localhost:9090 to see the Prometheus UI.

Manual Testing

Creating a dev cluster

The steps below are provided in a convenient script in hack/create-dev-cluster.sh. Be sure to set AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_SUBSCRIPTION_ID, and AZURE_TENANT_ID before running. Optionally, you can override the different cluster configuration variables. For example, to override the workload cluster name:

CLUSTER_NAME=<my-capz-cluster-name> ./hack/create-dev-cluster.sh

NOTE: CLUSTER_NAME can only include letters, numbers, and hyphens and can’t be longer than 44 characters.

Building and pushing dev images
  1. To build images with custom tags, run the make docker-build as follows:

    export REGISTRY="<container-registry>"
    export MANAGER_IMAGE_TAG="<image-tag>" # optional - defaults to `dev`.
    PULL_POLICY=IfNotPresent make docker-build
    
  2. (optional) Push your docker images:

    2.1. Login to your container registry using docker login.

    e.g., docker login quay.io

    2.2. Push to your custom image registry:

    REGISTRY="<container-registry>" MANAGER_IMAGE_TAG="<image-tag>" make docker-push
    

    NOTE: make create-cluster will fetch the manager image locally and load it onto the kind cluster if it is present.

Customizing the cluster deployment

Here is a list of required configuration parameters (the full list is available in templates/cluster-template.yaml):

# Cluster settings.
export CLUSTER_NAME="capz-cluster"
export AZURE_VNET_NAME=${CLUSTER_NAME}-vnet

# Azure settings.
export AZURE_LOCATION="southcentralus"
export AZURE_RESOURCE_GROUP=${CLUSTER_NAME}
export AZURE_SUBSCRIPTION_ID_B64="$(echo -n "$AZURE_SUBSCRIPTION_ID" | base64 | tr -d '\n')"
export AZURE_TENANT_ID_B64="$(echo -n "$AZURE_TENANT_ID" | base64 | tr -d '\n')"
export AZURE_CLIENT_ID_B64="$(echo -n "$AZURE_CLIENT_ID" | base64 | tr -d '\n')"
export AZURE_CLIENT_SECRET_B64="$(echo -n "$AZURE_CLIENT_SECRET" | base64 | tr -d '\n')"

# Machine settings.
export CONTROL_PLANE_MACHINE_COUNT=3
export AZURE_CONTROL_PLANE_MACHINE_TYPE="Standard_D2s_v3"
export AZURE_NODE_MACHINE_TYPE="Standard_D2s_v3"
export WORKER_MACHINE_COUNT=2
export KUBERNETES_VERSION="v1.21.2"

# Generate SSH key.
# If you want to provide your own key, skip this step and set AZURE_SSH_PUBLIC_KEY_B64 to your existing file.
SSH_KEY_FILE=.sshkey
rm -f "${SSH_KEY_FILE}" 2>/dev/null
ssh-keygen -t rsa -b 2048 -f "${SSH_KEY_FILE}" -N '' 1>/dev/null
echo "Machine SSH key generated in ${SSH_KEY_FILE}"
# For Linux the ssh key needs to be b64 encoded because we use the azure api to set it
# Windows doesn't support setting ssh keys so we use cloudbase-init to set which doesn't require base64
export AZURE_SSH_PUBLIC_KEY_B64=$(cat "${SSH_KEY_FILE}.pub" | base64 | tr -d '\r\n')
export AZURE_SSH_PUBLIC_KEY=$(cat "${SSH_KEY_FILE}.pub" | tr -d '\r\n')

⚠️ Please note the generated templates include default values and therefore require the use of clusterctl to create the cluster or the use of envsubst to replace these values

Creating the cluster

⚠️ Make sure you followed the previous two steps to build the dev image and set the required environment variables before proceding.

Ensure dev environment has been reset:

make clean kind-reset

Create the cluster:

make create-cluster

Check out the troubleshooting guide for common errors you might run into.

Instrumenting Telemetry

Telemetry is the key to operational transparency. We strive to provide insight into the internal behavior of the system through observable traces and metrics.

Distributed Tracing

Distributed tracing provides a hierarchical view of how and why an event occurred. CAPZ is instrumented to trace each controller reconcile loop. When the reconcile loop begins, a trace span begins and is stored in loop context.Context. As the context is passed on to functions below, new spans are created, tied to the parent span by the parent span ID. The spans form a hierarchical representation of the activities in the controller.

These spans can also be propagated across service boundaries. The span context can be passed on through metadata such as HTTP headers. By propagating span context, it creates a distributed, causal relationship between services and functions.

For tracing, we use OpenTelemetry.

Here is an example of staring a span in the beginning of a controller reconcile.

ctx, span := tele.Tracer().Start(ctx, "controllers.AzureMachineReconciler.Reconcile",
    trace.WithAttributes(
        attribute.String("namespace", req.Namespace),
        attribute.String("name", req.Name),
        attribute.String("kind", "AzureMachine"),
    ))
defer span.End()

The code above creates a context with a new span stored in the context.Context value bag. If a span already existed in the ctx arguement, then the new span would take on the parentID of the existing span, otherwise the new span becomes a “root span”, one that does not have a parent. The span is also created with labels, or tags, which provide metadata about the span and can be used to query in many distributed tracing systems.

You should consider adding tracing if your func accepts a context.

Metrics

Metrics provide quantitative data about the operations of the controller. This includes cumulative data like counters, single numerical values like guages, and distributions of counts / samples like histograms & summaries.

In CAPZ we expose metrics using the Prometheus client. The Kubebuilder project provides a guide for metrics and for exposing new ones.

Submitting PRs and testing

Pull requests and issues are highly encouraged! If you’re interested in submitting PRs to the project, please be sure to run some initial checks prior to submission:

make lint # Runs a suite of quick scripts to check code structure
make test # Runs tests on the Go code

Executing unit tests

make test executes the project’s unit tests. These tests do not stand up a Kubernetes cluster, nor do they have external dependencies.

Automated Testing

Mocks

Mocks for the services tests are generated using GoMock.

To generate the mocks you can run

make generate-go

E2E Testing

To run E2E locally, set AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_SUBSCRIPTION_ID, AZURE_TENANT_ID, and run:

./scripts/ci-e2e.sh

You can optionally set the following variables:

VariableDescriptionDefault
E2E_CONF_FILEThe path of the E2E configuration file.${GOPATH}/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/config/azure-dev.yaml
SKIP_CLEANUPSet to true if you do not want the bootstrap and workload clusters to be cleaned up after running E2E tests.false
SKIP_CREATE_MGMT_CLUSTERSkip management cluster creation. If skipping managment cluster creation you must specify KUBECONFIG and SKIP_CLEANUPfalse
LOCAL_ONLYUse Kind local registry and run the subset of tests which don’t require a remotely pushed controller image.true
REGISTRYRegistry to push the controller image.capzci.azurecr.io/ci-e2e
CLUSTER_NAMEName of an existing workload cluster. Will run specs against existing workload cluster. Use in conjunction with SKIP_CREATE_MGMT_CLUSTER, GINKGO_FOCUS and KUBECONFIG. Must specify only one e2e spec to run against with GINKGO_FOCUS such as export GINKO_FOCUS=Creating.a.VMSS.cluster.with.a.single.control.plane.node.
KUBECONFIGUsed with SKIP_CREATE_MGMT_CLUSTER set to true. Location of kubeconfig for the management cluster you would like to use. Use kind get kubeconfig --name capz-e2e > kubeconfig.capz-e2e to get the capz e2e kind cluster config‘~/.kube/config’

You can also customize the configuration of the CAPZ cluster created by the E2E tests (except for CLUSTER_NAME, AZURE_RESOURCE_GROUP, AZURE_VNET_NAME, CONTROL_PLANE_MACHINE_COUNT, and WORKER_MACHINE_COUNT, since they are generated by individual test cases). See Customizing the cluster deployment for more details.

Conformance Testing

To run the Kubernetes Conformance test suite locally, you can run

./scripts/ci-conformance.sh

Optional settings are:

Environment VariableDefault ValueDescription
WINDOWSfalseRun conformance against Windows nodes
CONFORMANCE_NODES1Number of parallel ginkgo nodes to run

With the following environment variables defined, you can build a CAPZ cluster from the HEAD of Kubernetes main branch or release branch, and run the Conformance test suite against it. This is not enabled for Windows currently.

Environment VariableValue
E2E_ARGS-kubetest.use-ci-artifacts
KUBERNETES_VERSIONlatest - extract Kubernetes version from https://dl.k8s.io/ci/latest.txt (main’s HEAD)
latest-1.21 - extract Kubernetes version from https://dl.k8s.io/ci/latest-1.21.txt (release branch’s HEAD)

With the following environment variables defined, CAPZ runs ./scripts/ci-build-kubernetes.sh as part of ./scripts/ci-conformance.sh, which allows developers to build Kubernetes from source and run the Kubernetes Conformance test suite against a CAPZ cluster based on the custom build:

Environment VariableValue
AZURE_STORAGE_ACCOUNTYour Azure storage account name
AZURE_STORAGE_KEYYour Azure storage key
JOB_NAMEtest (an enviroment variable used by CI, can be any non-empty string)
LOCAL_ONLYfalse
REGISTRYYour Registry
TEST_K8Strue

Running custom test suites on CAPZ clusters

To run a custom test suite on a CAPZ cluster locally, set AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_SUBSCRIPTION_ID, AZURE_TENANT_ID and run:

./scripts/ci-entrypoint.sh bash -c "cd ${GOPATH}/src/github.com/my-org/my-project && make e2e"

You can optionally set the following variables:

VariableDescription
AZURE_SSH_PUBLIC_KEY_FILEUse your own SSH key.
SKIP_CLEANUPSkip deleting the cluster after the tests finish running.
KUBECONFIGProvide your existing cluster kubeconfig filepath. If no kubeconfig is provided, ./kubeconfig will be used.
USE_CI_ARTIFACTSUse a CI version of Kubernetes, ie. not a released version (eg. v1.19.0-alpha.1.426+0926c9c47677e9)
CI_VERSIONProvide a custom CI version of Kubernetes. By default, the latest master commit will be used.
TEST_CCMBuild a cluster that uses custom versions of the Azure cloud-provider cloud-controller-manager and node-controller-manager images
EXP_MACHINE_POOLUse Machine Pool for worker machines.
REGISTRYRegistry to push any custom k8s images or cloud provider images built.
CLUSTER_TEMPLATEUse a custom cluster template. By default, the script will choose the appropriate cluster template based on existing environment variabes.

You can also customize the configuration of the CAPZ cluster (assuming that SKIP_CREATE_WORKLOAD_CLUSTER is not set). See Customizing the cluster deployment for more details.

For Kubernetes Developers

If you are working on Kubernetes upstream, you can use the Cluster API Azure Provider to test your build of Kubernetes in an Azure environment.

Kubernetes 1.17+

Kubernetes has removed make WHAT=cmd/hyperkube command you will have to build individual Kubernetes components and deploy them separately. That includes:

  • Run the following commands to build Kubernetes and upload artifacts to a registry and Azure blob storage:
export AZURE_STORAGE_ACCOUNT=<AzureStorageAccount>
export AZURE_STORAGE_KEY=<AzureStorageKey>
export REGISTRY=<Registry>
export TEST_K8S="true"
export JOB_NAME="test" # an enviroment variable used by CI, can be any non-empty string

source ./scripts/ci-build-kubernetes.sh

A template is provided that enables building clusters from custom built Kubernetes components:

export CLUSTER_TEMPLATE="test/dev/cluster-template-custom-builds.yaml"
./hack/create-dev-cluster.sh

Testing the out-of-tree cloud provider

To test changes made to the Azure cloud provider, first build and push images for cloud-controller-manager and/or cloud-node-manager from the root of the cloud-provider-azure repo.

Then, use the external-cloud-provider flavor to create a cluster:

AZURE_CLOUD_CONTROLLER_MANAGER_IMG=myrepo/my-ccm:v0.0.1 \
AZURE_CLOUD_NODE_MANAGER_IMG=myrepo/my-cnm:v0.0.1 \
CLUSTER_TEMPLATE=cluster-template-external-cloud-provider.yaml \
make create-workload-cluster

Release Process

Change milestone

  • Create a new GitHub milestone for the next release
  • Change milestone applier so new changes can be applied to the appropriate release
    • Open a PR in https://github.com/kubernetes/test-infra to change this line
      • Example PR: https://github.com/kubernetes/test-infra/pull/16827

Prepare branch, tag and release notes

  • Identify a known good commit on the main branch
  • Fast-forward the release branch to the selected commit. :warning: Always release from the release branch and not from master!
    • git checkout release-0.x
    • git fetch upstream
    • git merge --ff-only upstream/master
    • git push
  • Create tag with git
    • export RELEASE_TAG=v0.4.6 (the tag of the release to be cut)
    • git tag -s ${RELEASE_TAG} -m "${RELEASE_TAG}"
    • git push upstream ${RELEASE_TAG}
  • Update the file metadata.yaml if is a major or minor release
  • make release from repo, this will create the release artifacts in the out/ folder
  • Install the release-notes tool according to instructions
  • Export GITHUB_TOKEN
  • Run the release-notes tool with the appropriate commits. Commits range from the first commit after the previous release to the new release commit.
release-notes --github-org kubernetes-sigs --github-repo cluster-api-provider-azure \
--start-sha 1cf1ec4a1effd9340fe7370ab45b173a4979dc8f  \
--end-sha e843409f896981185ca31d6b4a4c939f27d975de
  • Manually format and categorize the release notes

Promote image to prod repo

Promote image

  • Images are built by the post push images job
  • Create a PR in https://github.com/kubernetes/k8s.io to add the image and tag
    • Example PR: https://github.com/kubernetes/k8s.io/pull/1030/files
  • Location of image: https://console.cloud.google.com/gcr/images/k8s-staging-cluster-api-azure/GLOBAL/cluster-api-azure-controller?rImageListsize=30

Release in GitHub

Create the GitHub release in the UI

  • Create a draft release in GitHub and associate it with the tag that was created
  • Copy paste the release notes
  • Upload artifacts from the out/ folder
  • Publish release
  • Announce the release

Versioning

cluster-api-provider-azure follows the semantic versionining specification.

Example versions:

  • Pre-release: v0.1.1-alpha.1
  • Minor release: v0.1.0
  • Patch release: v0.1.1
  • Major release: v1.0.0

Expected artifacts

  1. A release yaml file infrastructure-components.yaml containing the resources needed to deploy to Kubernetes
  2. A cluster-templates.yaml for each supported flavor
  3. A metadata.yaml which maps release series to cluster-api contract version
  4. Release notes

Communication

Patch Releases

  1. Announce the release in Kubernetes Slack on the #cluster-api-azure channel.

Minor/Major Releases

  1. Follow the communications process for pre-releases
  2. An announcement email is sent to kubernetes-sig-azure@googlegroups.com and kubernetes-sig-cluster-lifecycle@googlegroups.com with the subject [ANNOUNCE] cluster-api-provider-azure <version> has been released