SYSTEM Cited by 2 sources

NVIDIA MIG¶

NVIDIA MIG (Multi-Instance GPU) is the hardware-level partitioning feature introduced on the A100 (and carried forward to H100) that splits a single physical GPU into up to 7 isolated instances, each with its own slice of SMs, memory, L2 cache, and memory bandwidth. Each MIG instance presents as a distinct CUDA device, with hardware-level QoS and fault isolation. It is NVIDIA's canonical answer to "fractional-GPU-as-a-service" for multi-tenant inference.

Seen in (wiki)¶

Fly.io tried and abandoned MIG (and vGPU) productisation via IOMMU PCI passthrough into Firecracker micro-VMs — a roughly quarter-long project in 2023, described as "a project so cursed that Thomas has forsworn ever programming again". Fly pivoted to whole-A100 attachment instead. The reasons-why are not disclosed in the post, but the datum itself is load-bearing: IOMMU PCI-passthrough-based fractional-GPU virtualisation is not a turnkey path even for a platform already deep in Firecracker-based virtualisation. (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)
Fly.io 2025-02-14 — "MIG gives you a UUID, not a PCI device." The 2025-02 retrospective discloses the specific reason MIG / vGPU never reached productisation on Fly's micro-VM hypervisor path: the MIG slice "gives you a UUID to talk to the host driver, not a PCI device" — which breaks the PCI-passthrough model Cloud Hypervisor uses to surface hardware into Fly Machines. "For fully-virtualized workloads, it's not baked; we can't use it." Canonical wiki statement of why thin-sliced GPU for developers is off the Nvidia driver happy path for a micro-VM platform, and a load-bearing cost of the security-first hypervisor choice. (Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Why it matters¶

Fractional GPU is the economically obvious fit for inference multi-tenancy, because most inference workloads don't saturate a whole A100. MIG is the hardware answer; the fact that Fly.io — an operationally sophisticated Firecracker-based platform — couldn't productise it is evidence the path is harder than the spec implies.
Negative-space wiki datum. When (if) another source publishes a successful or failed MIG-in-micro-VM integration, this page is the anchor for the comparison. Paired with Fly.io's later pivot to whole-device attach + cheaper-cards-for-inference (L40S), MIG ends up skipped in the productised path.

systems/nvidia-a100 — MIG's introduction card.
systems/nvidia-h100 — MIG also supported.
systems/firecracker — Fly.io's micro-VM substrate; the IOMMU passthrough integration target that didn't stick.
systems/fly-machines — ended up attaching whole GPUs rather than MIG slices.
companies/flyio — source of the negative datum.

NVIDIA MIG¶

Seen in (wiki)¶

Why it matters¶

Related¶