Skip to content

SYSTEM Cited by 2 sources

NVIDIA MIG

NVIDIA MIG (Multi-Instance GPU) is the hardware-level partitioning feature introduced on the A100 (and carried forward to H100) that splits a single physical GPU into up to 7 isolated instances, each with its own slice of SMs, memory, L2 cache, and memory bandwidth. Each MIG instance presents as a distinct CUDA device, with hardware-level QoS and fault isolation. It is NVIDIA's canonical answer to "fractional-GPU-as-a-service" for multi-tenant inference.

Seen in (wiki)

  • Fly.io tried and abandoned MIG (and vGPU) productisation via IOMMU PCI passthrough into Firecracker micro-VMs — a roughly quarter-long project in 2023, described as "a project so cursed that Thomas has forsworn ever programming again". Fly pivoted to whole-A100 attachment instead. The reasons-why are not disclosed in the post, but the datum itself is load-bearing: IOMMU PCI-passthrough-based fractional-GPU virtualisation is not a turnkey path even for a platform already deep in Firecracker-based virtualisation. (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)
  • Fly.io 2025-02-14 — "MIG gives you a UUID, not a PCI device." The 2025-02 retrospective discloses the specific reason MIG / vGPU never reached productisation on Fly's micro-VM hypervisor path: the MIG slice "gives you a UUID to talk to the host driver, not a PCI device" — which breaks the PCI-passthrough model Cloud Hypervisor uses to surface hardware into Fly Machines. "For fully-virtualized workloads, it's not baked; we can't use it." Canonical wiki statement of why thin-sliced GPU for developers is off the Nvidia driver happy path for a micro-VM platform, and a load-bearing cost of the security-first hypervisor choice. (Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Why it matters

  • Fractional GPU is the economically obvious fit for inference multi-tenancy, because most inference workloads don't saturate a whole A100. MIG is the hardware answer; the fact that Fly.io — an operationally sophisticated Firecracker-based platform — couldn't productise it is evidence the path is harder than the spec implies.
  • Negative-space wiki datum. When (if) another source publishes a successful or failed MIG-in-micro-VM integration, this page is the anchor for the comparison. Paired with Fly.io's later pivot to whole-device attach + cheaper-cards-for-inference (L40S), MIG ends up skipped in the productised path.
Last updated · 200 distilled / 1,178 read