Skip to content

SYSTEM Cited by 3 sources

NVIDIA A100

The NVIDIA A100 (Ampere architecture, 2020) is the training-first datacenter GPU that preceded the H100 as the frontier part. Two variants are commonly distinguished: A100 40G PCIe (air-cooled PCIe slot, no NVLink switch fabric) and A100 80G SXM (liquid-capable, NVLink / NVSwitch, forms the tight-coupling domain in DGX/HGX reference designs).

Seen in (wiki)

  • Fly.io GPU lineup. Fly.io stocks both variants — 40G PCIe and 80G SXM — as the top two steps of its GPU catalogue. 2023 product strategy was built around selling them: first fractional A100 slices via MIG / vGPUs over IOMMU PCI passthrough (abandoned after "a whole quarter"), then whole-A100s, then NVLink-ganged A100 clusters intended for distributed training. Customer data then showed the lower end of the lineup (A10) dominating by usage — the inflection that led to the 2024-08-15 L40S price cut. (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)
  • Fly.io 2025-02-14 retrospective — A100 as "compromise position" for serious-AI customers. Fly.io's GPU course-correction post names the A100 as the middle rung serious-AI buyers wouldn't settle for: "People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s." A framing datum on the ceiling Fly.io cannot reach — an insurgent-cloud capacity-supply constraint (concepts/insurgent-cloud-constraints). Whole-A100 attachment on Fly remains productised; the bigger-than-A100 frontier cluster shape is absent from the product. (Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)
  • Fly.io 2024-05-09 image-description walkthrough — explicit a100-40gb preset string + LLaVA-34b workload on a single Machine. The Fly Machine preset name is disclosed verbatim ("the a100-40gb Fly Machine preset") and pinned as the GPU shape the 34b-parameter LLaVA vision-language model runs on. Canonical wiki datum for the three-stage cold-start budget (concepts/gpu-scale-to-zero-cold-start) on this SKU: a few seconds Machine boot + several tens of seconds to load the model into GPU RAM + several seconds for first-response → ~45 seconds end-to-end cold start. Warm per-response latency "several seconds". Illustrates A100 40G PCIe as a single-Machine inference workhorse even for mid-30B-parameter multimodal models — no cluster required, no NVLink exploited, a scale-to-zero workload with proxy-managed lifecycle. (Source: sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-description)

Why it matters

  • Training default. For distributed training, the A100 SXM's NVLink / NVSwitch fabric is the architectural feature that makes tensor/pipeline parallelism tractable — the same property H100 SXM inherits.
  • MIG as a fractional-GPU primitive. The A100 introduced MIG — 7 isolated slices per physical GPU, each with its own memory / SMs / caches — enabling multi-tenancy on a shared card. Fly.io's attempt to surface MIG to customers via Firecracker micro-VMs over IOMMU PCI passthrough is a negative-space wiki datum: tried and abandoned.
  • Over-provisioned for most inference. The Fly.io customer-usage revelation is that most inference workloads don't need A100-class compute; an A10 or L40S at a fraction of the cost is capable enough. The A100 remains the fit for training and for a minority of heavyweight inference workloads where its HBM capacity / interconnect genuinely pays.
Last updated · 200 distilled / 1,178 read