Autoscaling from Zero
- Feature status: Experimental
Overview
To enable Cluster API providers to dynamically scale node groups from zero to one and from one to zero, the Cluster API project has proposed a mechanism for providers to annotate MachineSet and MachinePool infrastructure templates with capacity and node information. This enables the cluster autoscaler to make informed scheduling decisions even when no nodes exist in a node group.
The autoscaling-from-zero feature works by having the infrastructure provider populate status fields on infrastructure templates (e.g., AzureMachineTemplate) with capacity (CPU, memory) and node information (architecture, operating system). The cluster-autoscaler then reads these status fields to simulate node capacity for scale-from-zero decisions.
Source: Opt-in Autoscaling from Zero Proposal
CAPZ implements this proposal by automatically populating AzureMachineTemplate status fields based on Azure VM SKU information. This enables cluster-autoscaler to scale MachineDeployments to zero replicas and back up based on workload demand.
Key benefits:
- Cost optimization by scaling unused node groups to zero
- Efficient resource utilization for dev/test environments
- Support for batch workloads that scale between job runs
How It Works
CAPZ's AzureMachineTemplate controller automatically populates status fields when a template is created or reconciled:
- The controller queries the Azure Resource SKUs API for VM size specifications
- It extracts capacity information (CPU cores, memory) from the SKU
- It determines node architecture (amd64/arm64) from SKU capabilities
- It derives the operating system (linux/windows) from the template's
osDisk.osTypefield - This information is written to
status.capacityandstatus.nodeInfofields
The cluster-autoscaler reads these status fields to simulate node capacity for pending pods, enabling scale-from-zero decisions without requiring actual nodes to exist.
The controller respects cluster pause annotations and requires the template to have an owner reference to a Cluster resource.
Configuration
Example MachineDeployment with Autoscaling from Zero
Below is an example of the resources needed to enable autoscaling from zero for a worker node pool.
AzureMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
name: ${CLUSTER_NAME}-worker
namespace: default
spec:
template:
spec:
vmSize: Standard_D2s_v3
osDisk:
diskSizeGB: 128
osType: Linux
sshPublicKey: ${AZURE_SSH_PUBLIC_KEY_B64}
# Status is automatically populated by CAPZ controller:
# status:
# capacity:
# cpu: "2"
# memory: "8Gi"
# nodeInfo:
# architecture: amd64
# operatingSystem: linux
MachineDeployment
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: ${CLUSTER_NAME}-worker
namespace: default
spec:
clusterName: ${CLUSTER_NAME}
replicas: 0 # Can start at zero
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
template:
spec:
clusterName: ${CLUSTER_NAME}
version: ${KUBERNETES_VERSION}
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: ${CLUSTER_NAME}-worker
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
name: ${CLUSTER_NAME}-worker
Status Fields
The CAPZ controller populates the following fields in AzureMachineTemplate status:
| Field | Description | Example | Source |
|---|---|---|---|
status.capacity.cpu | Number of vCPUs | "2", "4", "8" | Azure SKU API |
status.capacity.memory | Memory size | "8Gi", "16Gi" | Azure SKU API |
status.nodeInfo.architecture | CPU architecture | amd64, arm64 | Azure SKU API |
status.nodeInfo.operatingSystem | OS type | linux, windows | Template osDisk.osType |
Inspect the status of an AzureMachineTemplate:
kubectl get azuremachinetemplate ${CLUSTER_NAME}-worker -o jsonpath='{.status}' | jq
Example output:
{
"capacity": {
"cpu": "2",
"memory": "8Gi"
},
"nodeInfo": {
"architecture": "amd64",
"operatingSystem": "linux"
}
}
Related Resources
- ClusterClass - Using autoscaling-from-zero with ClusterClass
- Machine Pools (VMSS) - Alternative scaling approach
- Cluster API Autoscaling
- Autoscaling from Zero Proposal
- Kubernetes Cluster Autoscaler
- Azure VM Sizes