SYSTEM Cited by 3 sources
NVIDIA A100¶
The NVIDIA A100 (Ampere architecture, 2020) is the training-first datacenter GPU that preceded the H100 as the frontier part. Two variants are commonly distinguished: A100 40G PCIe (air-cooled PCIe slot, no NVLink switch fabric) and A100 80G SXM (liquid-capable, NVLink / NVSwitch, forms the tight-coupling domain in DGX/HGX reference designs).
Seen in (wiki)¶
- Fly.io GPU lineup. Fly.io stocks both variants — 40G PCIe and 80G SXM — as the top two steps of its GPU catalogue. 2023 product strategy was built around selling them: first fractional A100 slices via MIG / vGPUs over IOMMU PCI passthrough (abandoned after "a whole quarter"), then whole-A100s, then NVLink-ganged A100 clusters intended for distributed training. Customer data then showed the lower end of the lineup (A10) dominating by usage — the inflection that led to the 2024-08-15 L40S price cut. (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)
- Fly.io 2025-02-14 retrospective — A100 as "compromise position" for serious-AI customers. Fly.io's GPU course-correction post names the A100 as the middle rung serious-AI buyers wouldn't settle for: "People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s." A framing datum on the ceiling Fly.io cannot reach — an insurgent-cloud capacity-supply constraint (concepts/insurgent-cloud-constraints). Whole-A100 attachment on Fly remains productised; the bigger-than-A100 frontier cluster shape is absent from the product. (Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)
- Fly.io 2024-05-09 image-description walkthrough — explicit
a100-40gbpreset string + LLaVA-34b workload on a single Machine. The Fly Machine preset name is disclosed verbatim ("thea100-40gbFly Machine preset") and pinned as the GPU shape the 34b-parameter LLaVA vision-language model runs on. Canonical wiki datum for the three-stage cold-start budget (concepts/gpu-scale-to-zero-cold-start) on this SKU: a few seconds Machine boot + several tens of seconds to load the model into GPU RAM + several seconds for first-response → ~45 seconds end-to-end cold start. Warm per-response latency "several seconds". Illustrates A100 40G PCIe as a single-Machine inference workhorse even for mid-30B-parameter multimodal models — no cluster required, no NVLink exploited, a scale-to-zero workload with proxy-managed lifecycle. (Source: sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-description)
Why it matters¶
- Training default. For distributed training, the A100 SXM's NVLink / NVSwitch fabric is the architectural feature that makes tensor/pipeline parallelism tractable — the same property H100 SXM inherits.
- MIG as a fractional-GPU primitive. The A100 introduced MIG — 7 isolated slices per physical GPU, each with its own memory / SMs / caches — enabling multi-tenancy on a shared card. Fly.io's attempt to surface MIG to customers via Firecracker micro-VMs over IOMMU PCI passthrough is a negative-space wiki datum: tried and abandoned.
- Over-provisioned for most inference. The Fly.io customer-usage revelation is that most inference workloads don't need A100-class compute; an A10 or L40S at a fraction of the cost is capable enough. The A100 remains the fit for training and for a minority of heavyweight inference workloads where its HBM capacity / interconnect genuinely pays.
Related¶
- systems/nvidia-h100 — successor frontier part (Hopper).
- systems/nvidia-l40s — Ada Lovelace inference alternative that claims A100-class AI compute at a gaming-GPU cost basis.
- systems/nvidia-a10 — the inference volume leader in the Fly.io dataset; a tier below the A100.
- systems/nvidia-mig — A100-introduced fractional-GPU primitive; Fly.io tried and abandoned exposing this.
- systems/nvlink — intra-node interconnect that defines the SXM variant's training-cluster shape.
- systems/fly-machines — whole-A100 attachment path.
- concepts/training-serving-boundary — A100 sits on the training side of the historical split.