Skip to content

CONCEPT Cited by 1 source

Nvidia driver happy path

Definition

The Nvidia driver "happy path" is the shape of host configuration that Nvidia's proprietary driver stack is engineered to support turnkey: a Linux host running either (a) Kubernetes with a shared kernel (all tenants on one OS image), or (b) a conventional hypervisor — primarily VMware (with vGPU) or QEMU (with KVM). Any deviation from this path — notably, micro-VM hypervisors like Firecracker or Intel Cloud Hypervisor — costs the platform driver-integration engineering that may simply not work at all.

Canonical wiki statement

Fly.io, 2025-02-14:

We could have shipped GPUs very quickly by doing what Nvidia recommended: standing up a standard K8s cluster to schedule GPU jobs on. Had we taken that path, and let our GPU users share a single Linux kernel, we'd have been on Nvidia's driver happy-path.

Alternatively, we could have used a conventional hypervisor. Nvidia suggested VMware (heh). But they could have gotten things working had we used QEMU.

Instead, we burned months trying (and ultimately failing) to get Nvidia's host drivers working to map virtualized GPUs into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU.

(Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Why there's a happy path

  • Nvidia's driver is proprietary and closed-source. The platform operator can't fork, patch, or even cleanly audit it. Whatever hypervisor / kernel / virtualization surface Nvidia's driver team happens to test against is what works.
  • Nvidia's enterprise customers are K8s + VMware + QEMU. Hyperscalers, enterprise-on-prem, and the Nvidia DGX reference designs all land on one of these three shapes. Nvidia's driver QA targets them.
  • Micro-VM hypervisors are a small fraction of the install base. Firecracker and Cloud Hypervisor are relatively recent; they're optimised for density and boot speed rather than the general-purpose device model Nvidia's driver expects.

What's on and off the path

Hypervisor / surface On path? Notes
Bare-metal Linux + container (shared kernel) K8s GPU operator, Docker+Nvidia-runtime
VMware ESXi with vGPU Nvidia vGPU licensing; canonical enterprise
QEMU + KVM + VFIO PCI passthrough OpenStack, libvirt, oVirt
QEMU + KVM + vGPU (GRID/vGPU mediated) Same vGPU model as VMware
Intel Cloud Hypervisor ⚠️ PCI passthrough in principle; Fly.io failed to get virtualized-GPU drivers working on it
Firecracker No PCI passthrough
Custom / rolled Rust VMM No Nvidia support contract

Cost of deviating

  • Driver work that may never converge. Fly.io's Cloud Hypervisor + Nvidia-virtualized-GPU integration "burned months" and didn't ship. The hex-edit-the-driver-to- impersonate-QEMU episode is a characterisation of how far Fly.io was willing to push — and it still didn't produce a productisable path.
  • Locked out of thin-slicing. MIG / vGPU — the fractional-GPU surface — expects driver cooperation that off-path hypervisors don't get. Fly.io: "MIG gives you a UUID to talk to the host driver, not a PCI device" — and a PCI-passthrough-based VMM can only pass through PCI devices. The thin-slicing customer segment is thereby unreachable.
  • Driver-version churn risk. Even a working integration can break on a driver update. The platform operator owns the driver-upgrade cadence forever.
  • Support inaccessibility. Without an enterprise support contract on a supported configuration, the platform operator is on their own for debugging driver issues.

Why you might deviate anyway

  • Millisecond-boot DX. QEMU and VMware boot in seconds. For serverless / ephemeral / fast- scale-to-zero workloads, seconds-boot is unusable. Fly.io chose the Cloud Hypervisor path explicitly because of this: "the whole point of Fly Machines is that they take milliseconds to start."
  • Security posture. Shared-kernel K8s is the simplest on- path choice, but it weakens the isolation posture (see concepts/gpu-as-hostile-peripheral). A platform selling micro-VM isolation as a feature can't regress to shared kernels for GPU workloads without eroding the product claim.
  • Rust-stack culture. Institutional fit with Rust-written micro-VMs is hard to abandon for VMware.

Implications

  • GPU virtualisation in micro-VMs is a research project, not a product. Fly.io's failed attempt is the wiki's cleanest public datum that the path exists in principle (PCI passthrough + IOMMU + a Linux + a micro-VM hypervisor) but does not work with Nvidia drivers out of the box.
  • Platform GPU offerings converge on one of three shapes. Shared-kernel K8s (hyperscalers, GPU-specialist clouds), VMware/QEMU-based VMs (enterprise), or bare-metal / dedicated hosts (HPC). Micro-VM-shaped platforms have to either adopt one of these or absorb the integration cost.
  • Thin-sliced GPUs are out of reach for off-path platforms. Platforms that can't productise MIG / vGPU can only sell whole-GPU attachment. That excludes the small-GPU-for- developer market segment.

Caveats

  • Not a permanent state of nature. Nvidia's driver team can (and occasionally does) add surfaces; the Cloud Hypervisor community is actively lobbying. The happy path can widen.
  • "On-path" isn't a binary. Different Nvidia SKUs have different virtualization support matrices (MIG is an A100 / H100 feature; vGPU licensing varies; the consumer L40S may have different support than a data-centre A100).
  • Applies beyond Nvidia. The same "vendor driver happy path" pressure exists for AMD / Intel GPU drivers, FPGA vendors, DPU vendors, and any hardware accelerator with a proprietary driver.

Seen in (wiki)

Last updated · 200 distilled / 1,178 read