GPU-enabled clusters

Overview

With CAPZ you can create GPU-enabled Kubernetes clusters on Microsoft Azure.

Before you begin, be aware that:

Scheduling GPUs is a Kubernetes beta feature
NVIDIA GPUs are supported on Azure NC-series, NV-series, and NVv3-series VMs
NVIDIA GPU Operator allows administrators of Kubernetes clusters to manage GPU nodes just like CPU nodes in the cluster.

To deploy a cluster with support for GPU nodes, use the nvidia-gpu flavor.

An example GPU cluster

Let’s create a CAPZ cluster with an N-series node and run a GPU-powered vector calculation.

Generate an nvidia-gpu cluster template

Use the clusterctl generate cluster command to generate a manifest that defines your GPU-enabled workload cluster.

Remember to use the nvidia-gpu flavor with N-series nodes.

AZURE_CONTROL_PLANE_MACHINE_TYPE=Standard_B2s \
AZURE_NODE_MACHINE_TYPE=Standard_NC6s_v3 \
AZURE_LOCATION=southcentralus \
clusterctl generate cluster azure-gpu \
  --kubernetes-version=v1.22.1 \
  --worker-machine-count=1 \
  --flavor=nvidia-gpu > azure-gpu-cluster.yaml

Create the cluster

Apply the manifest from the previous step to your management cluster to have CAPZ create a workload cluster:

$ kubectl apply -f azure-gpu-cluster.yaml
cluster.cluster.x-k8s.io/azure-gpu serverside-applied
azurecluster.infrastructure.cluster.x-k8s.io/azure-gpu serverside-applied
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/azure-gpu-control-plane serverside-applied
azuremachinetemplate.infrastructure.cluster.x-k8s.io/azure-gpu-control-plane serverside-applied
machinedeployment.cluster.x-k8s.io/azure-gpu-md-0 serverside-applied
azuremachinetemplate.infrastructure.cluster.x-k8s.io/azure-gpu-md-0 serverside-applied
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/azure-gpu-md-0 serverside-applied

Wait until the cluster and nodes are finished provisioning…

$ kubectl get cluster azure-gpu
NAME        PHASE
azure-gpu   Provisioned
$ kubectl get machines
NAME                             PROVIDERID                                                                                                                                     PHASE     VERSION
azure-gpu-control-plane-t94nm    azure:////subscriptions/<subscription_id>/resourceGroups/azure-gpu/providers/Microsoft.Compute/virtualMachines/azure-gpu-control-plane-nnb57   Running   v1.22.1
azure-gpu-md-0-f6b88dd78-vmkph   azure:////subscriptions/<subscription_id>/resourceGroups/azure-gpu/providers/Microsoft.Compute/virtualMachines/azure-gpu-md-0-gcc8v            Running   v1.22.1

… and then you can install a CNI of your choice.

Once all nodes are Ready, install the official NVIDIA gpu-operator via Helm.

Install nvidia gpu-operator Helm chart

If you don’t have helm, installation instructions for your environment can be found here.

First, grab the kubeconfig from your newly created cluster and save it to a file:

$ clusterctl get kubeconfig azure-gpu > ./azure-gpu-cluster.conf

Now we can use Helm to install the official chart:

$ helm install --kubeconfig ./azure-gpu-cluster.conf --repo https://helm.ngc.nvidia.com/nvidia gpu-operator --generate-name

The installation of GPU drivers via gpu-operator will take several minutes. Coffee or tea may be appropriate at this time.

After a time, you may run the following command against the workload cluster to check if all the gpu-operator resources are installed:

$ kubectl --kubeconfig ./azure-gpu-cluster.conf get pods -o wide | grep 'gpu\|nvidia'
NAMESPACE          NAME                                                              READY   STATUS      RESTARTS   AGE     IP               NODE                                      NOMINATED NODE   READINESS GATES
default            gpu-feature-discovery-r6zgh                                       1/1     Running     0          7m21s   192.168.132.75   azure-gpu-md-0-gcc8v            <none>           <none>
default            gpu-operator-1674686292-node-feature-discovery-master-79d8pbcg6   1/1     Running     0          8m15s   192.168.96.7     azure-gpu-control-plane-nnb57   <none>           <none>
default            gpu-operator-1674686292-node-feature-discovery-worker-g9dj2       1/1     Running     0          8m15s   192.168.132.66   gpu-md-0-gcc8v            <none>           <none>
default            gpu-operator-95b545d6f-rmlf2                                      1/1     Running     0          8m15s   192.168.132.67   gpu-md-0-gcc8v            <none>           <none>
default            nvidia-container-toolkit-daemonset-hstgw                          1/1     Running     0          7m21s   192.168.132.70   gpu-md-0-gcc8v            <none>           <none>
default            nvidia-cuda-validator-pdmkl                                       0/1     Completed   0          3m47s   192.168.132.74   azure-gpu-md-0-gcc8v            <none>           <none>
default            nvidia-dcgm-exporter-wjm7p                                        1/1     Running     0          7m21s   192.168.132.71   azure-gpu-md-0-gcc8v            <none>           <none>
default            nvidia-device-plugin-daemonset-csv6k                              1/1     Running     0          7m21s   192.168.132.73   azure-gpu-md-0-gcc8v            <none>           <none>
default            nvidia-device-plugin-validator-gxzt2                              0/1     Completed   0          2m49s   192.168.132.76   azure-gpu-md-0-gcc8v            <none>           <none>
default            nvidia-driver-daemonset-zww52                                     1/1     Running     0          7m46s   192.168.132.68   azure-gpu-md-0-gcc8v            <none>           <none>
default            nvidia-operator-validator-kjr6m                                   1/1     Running     0          7m21s   192.168.132.72   azure-gpu-md-0-gcc8v            <none>           <none>

You should see all pods in either a state of Running or Completed. If that is the case, then you know the driver installation and GPU node configuration is successful.

Then run the following commands against the workload cluster to verify that the NVIDIA device plugin has initialized and the nvidia.com/gpu resource is available:

$ kubectl --kubeconfig ./azure-gpu-cluster.conf get nodes
NAME                            STATUS   ROLES    AGE   VERSION
azure-gpu-control-plane-nnb57   Ready    master   42m   v1.22.1
azure-gpu-md-0-gcc8v            Ready    <none>   38m   v1.22.1
$ kubectl --kubeconfig ./azure-gpu-cluster.conf get node azure-gpu-md-0-gcc8v -o jsonpath={.status.allocatable} | jq
{
  "attachable-volumes-azure-disk": "12",
  "cpu": "6",
  "ephemeral-storage": "119716326407",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "115312060Ki",
  "nvidia.com/gpu": "1",
  "pods": "110"
}

Run a test app

Let’s create a pod manifest for the cuda-vector-add example from the Kubernetes documentation and deploy it:

$ cat > cuda-vector-add.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
      image: "registry.k8s.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
EOF
$ kubectl --kubeconfig ./azure-gpu-cluster.conf apply -f cuda-vector-add.yaml

The container will download, run, and perform a CUDA calculation with the GPU.

$ kubectl get po cuda-vector-add
cuda-vector-add   0/1     Completed   0          91s
$ kubectl logs cuda-vector-add
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

If you see output like the above, your GPU cluster is working!

The Cluster API Provider Azure Book