Lab 5: Advanced Fake-GPU Scheduling with HAMi and nvml‑mock
This lab uses NVIDIA's nvml‑mock library to simulate a high‑end GPU node — 8 fake A100 GPUs — inside a local kind cluster. You will build HAMi directly from the main branch, then verify GPU scheduling features: sharing, memory/core limits, percentage-based memory requests, and multi‑GPU allocation — all without physical hardware.
What You'll Get
After completing this lab, you will have a local Kubernetes cluster with:
- nvml‑mock making the node report 8 fake A100 GPUs (
nvidia.com/gpu: 80after HAMi slices each physical GPU into 10 virtual slots) - HAMi device‑plugin and scheduler running from the current
mainbranch image - Pods verified for: single GPU, GPU sharing, memory/core limits, percentage-based memory, and multi‑GPU allocation
No real CUDA runtime exists in this environment. Pods use busybox with CUDA_DISABLE_CONTROL=true to prevent HAMi's control library from attempting real device access. Runtime enforcement of memory and core limits still requires physical GPUs.
Installation Overview
| Step | Purpose |
|---|---|
| 1. Create kind Cluster | Local Kubernetes via kind |
| 2. Deploy nvml‑mock | 8 fake A100 GPUs via libnvidia‑ml shim |
| 3. Build HAMi Image | Build device‑plugin + scheduler from main |
| 4. Deploy HAMi | Install from local Helm chart and label node |
| 5. Verify Resources | Confirm nvidia.com/gpu: 80 |
| 6. Basic GPU Pod | Single‑GPU allocation |
| 7. GPU Sharing | 4 Pods each request 1 GPU concurrently |
| 8. Memory and Core Limits | gpumem + gpucores in one Pod |
| 9. Percentage Memory | Request 30% of GPU memory |
| 10. Multi‑GPU Pod | Single Pod requests 2 GPUs |
Prerequisites
- macOS
- Linux (Ubuntu)
- macOS, Intel or Apple Silicon
- Docker Desktop or OrbStack installed and running
- Homebrew available
brew install kind kubectl helm git go
Verify versions:
kind version # 0.20+
kubectl version --client --short # 1.31+
helm version # 3.x
go version # 1.21+
- Ubuntu 20.04 LTS or later, x86_64
- Docker Engine installed and running
# kind
KIND_VERSION=v0.23.0
curl -Lo ./kind "https://kind.sigs.k8s.io/dl/${KIND_VERSION}/kind-linux-amd64"
chmod +x ./kind && sudo mv ./kind /usr/local/bin/kind
# kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl && rm kubectl
# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Go
GO_VERSION=1.24.0
curl -LO "https://go.dev/dl/go${GO_VERSION}.linux-amd64.tar.gz"
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go${GO_VERSION}.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.bashrc && source ~/.bashrc
Verify versions:
kind version # 0.20+
kubectl version --client --short # 1.31+
helm version # 3.x
go version # 1.21+
Use WSL2 with Ubuntu and follow the Linux tab above.
Step 1: Create the kind Cluster
kind create cluster --name nvml-mock-test
Set the NODE_NAME variable once — all subsequent commands use it:
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
echo "NODE_NAME=${NODE_NAME}"
NODE_NAME=nvml-mock-test-control-plane
Step 2: Build and Deploy nvml‑mock
nvml‑mock provides a fake libnvidia-ml.so, virtual /dev/nvidia* device nodes, and PCI topology entries so HAMi's device‑plugin sees 8 A100 GPUs on the node.
2.1 Clone and Build
git clone https://github.com/NVIDIA/k8s-test-infra.git
cd k8s-test-infra
docker build -t nvml-mock:local -f deployments/nvml-mock/Dockerfile .
The first build downloads base layers and may take 5–10 minutes. Subsequent builds use Docker layer cache.
2.2 Load into kind
kind load docker-image nvml-mock:local --name nvml-mock-test
2.3 Install via Helm
helm install nvml-mock oci://ghcr.io/nvidia/k8s-test-infra/chart/nvml-mock \
--set image.repository=nvml-mock \
--set image.tag=local \
--wait --timeout 120s
The chart configures an A100 profile by default: 8 GPUs per node, driver version 550.163.01, fake driver root at /var/lib/nvml-mock/driver. This driver root path is passed to HAMi in Step 4.
2.4 Verify GPU Discovery
kubectl get node ${NODE_NAME} \
-o custom-columns=NAME:.metadata.name,GPU_PRESENT:.metadata.labels.nvidia\\.com/gpu\\.present
Expected output
NAME GPU_PRESENT
nvml-mock-test-control-plane true
Step 3: Build HAMi from the main Branch
The main branch contains a fix preventing nvidia‑mig‑parted from being called when MIG is not enabled. Building from source ensures the fix is present without waiting for a tagged release.
3.1 Clone and Initialise Submodules
cd ~
git clone https://github.com/Project-HAMi/HAMi.git
cd HAMi
git submodule update --init --recursive
3.2 Build the Docker Image
docker build -t hami:local -f docker/Dockerfile .
HAMi uses a three‑stage Dockerfile: a Go build stage, a CUDA library build stage, and a final runtime stage. The first build takes several minutes as it pulls the CUDA base images; subsequent runs use the layer cache.
3.3 Load into kind
kind load docker-image hami:local --name nvml-mock-test
Both the scheduler and device‑plugin binaries are packaged into the single hami:local image.
Step 4: Deploy HAMi
4.1 Install via Helm
helm install hami ./charts/hami \
-n kube-system \
--set devicePlugin.image.repository=hami \
--set devicePlugin.image.tag=local \
--set scheduler.image.repository=hami \
--set scheduler.image.tag=local \
--set devicePlugin.nvidiaDriverRoot=/var/lib/nvml-mock/driver \
--set scheduler.kubeScheduler.imageTag=v1.35.0
devicePlugin.nvidiaDriverRoot points HAMi at the fake driver libraries installed by nvml‑mock.
4.2 Label the Node
The HAMi device-plugin DaemonSet has NODE SELECTOR: gpu=on. Without this label, DESIRED stays at 0, no Pod is scheduled, and no GPUs are registered.
kubectl label node ${NODE_NAME} gpu=on
Confirm the DaemonSet now schedules a Pod:
kubectl -n kube-system get daemonset hami-device-plugin
Expected output:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
hami-device-plugin 1 1 0 1 0 gpu=on 4m22s
4.3 Set the NVML Device Discovery Strategy
kubectl -n kube-system set env daemonset/hami-device-plugin \
-c device-plugin \
DEVICE_DISCOVERY_STRATEGY=nvml
This tells the device‑plugin to enumerate GPUs via the NVML API rather than scanning /dev. Without this, the plugin defaults to a file-based strategy that cannot see nvml‑mock's virtual devices.
4.4 Roll Out and Verify
kubectl -n kube-system rollout restart daemonset/hami-device-plugin
kubectl -n kube-system rollout status daemonset/hami-device-plugin --timeout=120s
Check for MIG errors — an empty response is the expected output:
kubectl -n kube-system logs daemonset/hami-device-plugin -c device-plugin | grep -i mig
Check overall Pod status:
kubectl -n kube-system get pods -l app.kubernetes.io/name=hami
Expected output:
NAME READY STATUS RESTARTS AGE
hami-device-plugin-lbctx 1/2 CrashLoopBackOff 6 9m24s
hami-scheduler-7858c744cc-7pb79 2/2 Running 0 13m
The vgpu-monitor sidecar crashes because it requires real GPU monitoring infrastructure. The device-plugin container is running correctly — 1/2 is expected here and does not affect GPU scheduling.
Step 5: Verify GPU Resources
HAMi partitions each physical GPU into 10 virtual slots. With 8 physical GPUs the node should advertise 80 allocatable virtual GPUs.
kubectl describe node ${NODE_NAME} | grep nvidia.com/gpu
Expected output:
nvidia.com/gpu.present=true
nvidia.com/gpu: 80
nvidia.com/gpu: 80
nvidia.com/gpu 0 0
Both Capacity and Allocatable showing 80 confirms the device‑plugin registered all virtual GPU slots. The final line is the Allocated resources table — currently 0 because no Pods have claimed GPUs yet.
Step 6: Test Basic GPU Scheduling
Deploy a minimal Pod requesting one GPU. CUDA_DISABLE_CONTROL=true prevents HAMi's injected CUDA shim from attempting real device access:
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-1
spec:
containers:
- name: sleep
image: busybox
command: ["sleep", "3600"]
env:
- name: CUDA_DISABLE_CONTROL
value: "true"
resources:
limits:
nvidia.com/gpu: 1
EOF
kubectl get pod gpu-test-1 -w
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-test-1 1/1 Running 0 9s
kubectl describe pod gpu-test-1 | grep vgpu-devices-allocated
Expected output:
hami.io/vgpu-devices-allocated: GPU-12345678-1234-1234-1234-123456780006,NVIDIA,40960,100:;
The annotation format is <UUID>,<vendor>,<memMiB>,<cores>. A100 GPUs have 40 960 MiB of VRAM — seeing this annotation confirms one virtual GPU was allocated and recorded by the scheduler.
Step 7: Test GPU Sharing (Time‑slicing)
Deploy three more Pods each requesting 1 GPU. Use an unquoted heredoc (<<EOF) so that $i expands to the Pod index:
for i in 2 3 4; do
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-$i
spec:
containers:
- name: sleep
image: busybox
command: ["sleep", "3600"]
env:
- name: CUDA_DISABLE_CONTROL
value: "true"
resources:
limits:
nvidia.com/gpu: 1
EOF
done
<<EOF, not <<'EOF' inside the loopSingle-quoting the delimiter suppresses shell expansion. $i would not be substituted and all three Pods would get the same name.
kubectl get pods | grep gpu-test
Expected output:
gpu-test-1 1/1 Running 0 3m19s
gpu-test-2 1/1 Running 0 10s
gpu-test-3 1/1 Running 0 10s
gpu-test-4 1/1 Running 0 9s
All four Pods run concurrently against the pool of 80 virtual GPU slots. The scheduler independently tracks each allocation via its own vgpu-devices-allocated annotation.
Step 8: Test Memory and Core Limits
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: gpu-limits
spec:
containers:
- name: sleep
image: busybox
command: ["sleep", "3600"]
env:
- name: CUDA_DISABLE_CONTROL
value: "true"
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: "10"
nvidia.com/gpucores: "30"
EOF
nvidia.com/gpumem takes an absolute value in MiB — "10" means 10 MiB. nvidia.com/gpucores: "30" requests 30 compute cores on the selected GPU.
kubectl describe pod gpu-limits | grep vgpu-devices-allocated
Expected output:
hami.io/vgpu-devices-allocated: GPU-12345678-1234-1234-1234-123456780002,NVIDIA,10,30:;
The annotation records 10 MiB and 30 cores — exactly the values requested.
Step 9: Test Percentage-Based Memory Request
Instead of a fixed MiB value, nvidia.com/gpumem-percentage lets you request a fraction of the GPU's total memory. On an A100 (40 960 MiB), requesting 30% allocates approximately 12 288 MiB. This is useful when you want workloads to scale proportionally across different GPU models without hardcoding absolute sizes.
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: gpu-mem-30pct
spec:
containers:
- name: sleep
image: busybox
command: ["sleep", "3600"]
env:
- name: CUDA_DISABLE_CONTROL
value: "true"
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem-percentage: "30"
EOF
Wait for the Pod to reach Running:
kubectl get pod gpu-mem-30pct -w
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-mem-30pct 1/1 Running 0 8s
Inspect the allocation annotation to confirm the scheduler resolved 30% into an absolute MiB value:
kubectl get pod gpu-mem-30pct \
-o jsonpath='{.metadata.annotations.hami\.io/vgpu-devices-allocated}'
Expected output:
GPU-12345678-1234-1234-1234-123456780003,NVIDIA,12288,100:;
The third field shows 12288 MiB — 30% of 40 960 MiB — confirming the scheduler correctly translated the percentage into an absolute memory budget for the allocation.
Step 10: Test Multi‑GPU Allocation
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: gpu-multi
spec:
containers:
- name: sleep
image: busybox
command: ["sleep", "3600"]
env:
- name: CUDA_DISABLE_CONTROL
value: "true"
resources:
limits:
nvidia.com/gpu: "2"
EOF
kubectl get pod gpu-multi
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-multi 1/1 Running 0 64s
kubectl describe pod gpu-multi | tail -20
Expected output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 70s hami-scheduler Successfully assigned default/gpu-multi to nvml-mock-test-control-plane
Normal FilteringSucceed 70s hami-scheduler find fit node(nvml-mock-test-control-plane), 0 nodes not fit, 1 nodes fit(nvml-mock-test-control-plane:13.63)
Normal BindingSucceed 70s hami-scheduler Successfully binding node [nvml-mock-test-control-plane] to default/gpu-multi
Normal Pulling 69s kubelet spec.containers{sleep}: Pulling image "busybox"
Normal Pulled 67s kubelet spec.containers{sleep}: Successfully pulled image "busybox" in 3.548s
Normal Created 67s kubelet spec.containers{sleep}: Container created
Normal Started 67s kubelet spec.containers{sleep}: Container started
The hami-scheduler events — FilteringSucceed, Scheduled, and BindingSucceed — confirm HAMi's scheduler handled this Pod and successfully bound it to the node with 2 GPU slots.
kubectl get pod gpu-multi \
-o jsonpath='{.metadata.annotations.hami\.io/vgpu-devices-allocated}'
You will see two semicolon-separated device entries, one per allocated vGPU slot.
Summary of Verified Features
| Feature | Test Pod | How It Is Verified |
|---|---|---|
| Basic GPU scheduling | gpu-test-1 | Annotation shows 1 vGPU UUID + 40 960 MiB |
| GPU sharing (time‑slicing) | gpu-test-1 through gpu-test-4 | All 4 Pods run concurrently |
Memory limit (gpumem) | gpu-limits | Annotation shows 10 MiB |
Core limit (gpucores) | gpu-limits | Annotation shows 30 cores |
Percentage memory (gpumem-percentage) | gpu-mem-30pct | Annotation shows 12288 MiB (30% of A100) |
| Multi‑GPU allocation | gpu-multi | hami-scheduler events show BindingSucceed |
- Actual CUDA program execution
- Runtime enforcement of
gpumemandgpucoreslimits - Real DCGM GPU metrics (temperature, utilisation)
- Memory overcommit and memory override features
Cleanup
kubectl delete pod gpu-test-1 gpu-test-2 gpu-test-3 gpu-test-4 \
gpu-limits gpu-mem-30pct gpu-multi
kubectl label node ${NODE_NAME} gpu-
helm uninstall hami -n kube-system
helm uninstall nvml-mock
kind delete cluster --name nvml-mock-test
Skip the cluster deletion step if you want to keep the environment for further experimentation.
Next Steps
- Move to a real GPU cluster (see Lab 1: Online HAMi Installation) to test memory and core isolation with actual CUDA workloads.
- Add Prometheus and HAMi WebUI for visual resource tracking (see Lab 2: Local Fake GPU Setup).