PATTERN Cited by 1 source
Dedicated host pool for hostile peripheral¶
Pattern¶
Segregate worker hardware so that VMs using a risky hardware peripheral (GPU, FPGA, DPU, custom NIC, HSM) do not share a physical host with VMs that don't use that peripheral. The peripheral is treated as a blast-radius expander — an exploit in the peripheral's driver or firmware can escape the per-VM boundary and reach co-resident tenants on the same host. Dedicating a pool of hosts to peripheral-using tenants bounds that blast radius to other tenants of the same peripheral, accepting a worse bin-packing + lower utilisation cost in exchange for a cleaner isolation story.
Canonical instance: Fly.io GPU Machines¶
Fly.io, 2025-02-14:
We did a couple expensive things to mitigate the risk. We shipped GPUs on dedicated server hardware, so that GPU- and non-GPU workloads weren't mixed. Because of that, the only reason for a Fly Machine to be scheduled on a GPU machine was that it needed a PCI BDF for an Nvidia GPU, and there's a limited number of those available on any box. Those GPU servers were drastically less utilized and thus less cost-effective than our ordinary servers.
Every Fly GPU Machine runs on a GPU-only worker host under Intel Cloud Hypervisor. The scheduler only lands a Machine on a GPU worker if it's requesting a PCI BDF for a GPU. General-compute Machines run on Firecracker-based workers, not shared with GPU hosts.
When to use¶
- Multi-tenant platform + risky hardware peripheral. DMA-capable peripherals with proprietary drivers or firmware (GPU, FPGA, custom NIC, DPU) whose exploit-surface the platform operator can't fully audit.
- Per-VM isolation posture is a product claim. The platform promises isolation; regressing to shared-kernel-for-the- peripheral-class contradicts the claim.
- The peripheral has peripheral-to-peripheral I/O paths. NVLink / PCIe-P2P / other accelerator-to-accelerator fabrics mean a compromised device can attack neighbours; per-host isolation matters, not just per-VM.
- Security-posture audit is part of the lifecycle. Regulated industries (FedRAMP / IL-tiers, HIPAA), customers with strong-isolation SLAs, or insurance-driven security postures.
When not to use¶
- Single-tenant clusters (HPC, bare-metal research). No co-resident tenants to isolate from.
- Shared-kernel K8s GPU clusters where the cloud vendor has already accepted the shared-kernel trade-off — the pattern isn't applicable; the vendor has chosen a different isolation posture.
- Peripherals without DMA or without tenant-controlled compute. A plain NIC with firmware that only interprets packets isn't a hostile peripheral in the same sense.
Structural parts¶
- Worker-class labels. The platform's scheduler knows which hosts are GPU-enabled. Fly.io's scheduler respects this at placement time.
- Peripheral-bounded placement. A VM that doesn't request a peripheral is never placed on a peripheral host. A VM that does is only placed on one. The PCI BDF count per host bounds concurrency.
- Workload-class-separated billing. GPU workers cost more per hour; customers pay the premium when they claim a GPU.
- Independent security assessment per peripheral class — see patterns/independent-security-assessment-for-hardware-peripheral, the companion process-level pattern.
Trade-offs¶
| Axis | Cost | Benefit |
|---|---|---|
| Bin-packing | Worse — GPU host is empty when not fully claimed | Non-GPU tenants are never co-resident with GPU exploit surface |
| Utilisation | Lower — Fly.io: "drastically less utilized" | Bounded blast radius |
| Capex | Higher per effective vCPU | Isolation posture is clean |
| Placement complexity | Scheduler carries workload-class | Peripheral class doesn't affect non-peripheral tenants |
| Upgrade cadence | Independent per peripheral class | Peripheral-driver-version churn doesn't destabilise general-compute fleet |
Known uses¶
- Fly.io GPU Machines (canonical wiki instance). GPU-only workers on Cloud Hypervisor; non-GPU workers on Firecracker. 2025-02-14 retrospective disclosed the utilisation cost.
- AWS P-instances / G-instances — separate instance families from the general-purpose M-class; hardware segregation below the surface is widely assumed but not directly disclosed.
- Hyperscaler "bare metal" GPU tiers (AWS EC2 Bare Metal, GCP A3-Ultra) — the extreme form of the pattern: single tenant per host, no bin-packing at all.
Architectural neighbours¶
- patterns/minimize-vm-permissions — Figma's Lambda sandboxing approach. Same isolation-by-design logic at a different boundary: minimise what the VM can reach, rather than segregate where the VM runs. Composable.
- concepts/micro-vm-isolation — the per-VM isolation primitive this pattern sits on top of. Hostile-peripheral dedicated-host-pool is the host-level pattern; micro-VM is the VM-level pattern; capability-sandbox is the runtime-level pattern.
- concepts/gpu-as-hostile-peripheral — the framing this pattern operationalises.
Caveats¶
- Doesn't eliminate risk — only bounds it. A GPU-to-GPU exploit on a dedicated GPU host still affects other tenants of that host.
- Reset/scrub between tenants is a separate problem. GPU state (VRAM, driver state) needs to be cleaned between VMs on the same host; this pattern doesn't solve that.
- Utilisation cost scales with peripheral density. Few PCI BDFs per host = low density = worse utilisation. More PCI BDFs per host = higher density = better utilisation but more tenants sharing the peripheral-to-peripheral fabric.
- The pattern bills through. Customers pay the premium for dedicated-host-pool utilisation. For price-sensitive workloads (see concepts/developers-want-llms-not-gpus), this can make the platform uncompetitive vs hyperscalers that have absorbed the utilisation cost at scale.
Seen in (wiki)¶
- sources/2025-02-14-flyio-we-were-wrong-about-gpus — canonical Fly.io wiki instance; the 2025-02 retrospective explicitly names the utilisation cost.
Related¶
- patterns/independent-security-assessment-for-hardware-peripheral — companion process-level pattern.
- patterns/minimize-vm-permissions — adjacent isolation-by-design pattern at a different boundary.
- concepts/gpu-as-hostile-peripheral — the concept this pattern instantiates.
- concepts/micro-vm-isolation — the per-VM isolation primitive this pattern builds on.
- systems/intel-cloud-hypervisor — the hypervisor Fly uses on the dedicated GPU workers.
- systems/firecracker — the hypervisor Fly uses on the general-compute workers.
- systems/fly-machines — the compute primitive whose peripheral-class scheduling is the canonical instance.
- systems/nvidia-l40s — one of the peripherals in question.
- companies/flyio — canonical wiki source.