SYSTEM Cited by 4 sources

NVIDIA A100¶

The NVIDIA A100 (Ampere architecture, 2020) is the training-first datacenter GPU that preceded the H100 as the frontier part. Two variants are commonly distinguished: A100 40G PCIe (air-cooled PCIe slot, no NVLink switch fabric) and A100 80G SXM (liquid-capable, NVLink / NVSwitch, forms the tight-coupling domain in DGX/HGX reference designs).

Seen in (wiki)¶

Fly.io GPU lineup. Fly.io stocks both variants — 40G PCIe and 80G SXM — as the top two steps of its GPU catalogue. 2023 product strategy was built around selling them: first fractional A100 slices via MIG / vGPUs over IOMMU PCI passthrough (abandoned after "a whole quarter"), then whole-A100s, then NVLink-ganged A100 clusters intended for distributed training. Customer data then showed the lower end of the lineup (A10) dominating by usage — the inflection that led to the 2024-08-15 L40S price cut. (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)
Fly.io 2025-02-14 retrospective — A100 as "compromise position" for serious-AI customers. Fly.io's GPU course-correction post names the A100 as the middle rung serious-AI buyers wouldn't settle for: "People doing serious AI work want galactically huge amounts of GPU compute. A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s." A framing datum on the ceiling Fly.io cannot reach — an insurgent-cloud capacity-supply constraint (concepts/insurgent-cloud-constraints). Whole-A100 attachment on Fly remains productised; the bigger-than-A100 frontier cluster shape is absent from the product. (Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)
Fly.io 2024-05-09 image-description walkthrough — explicit a100-40gb preset string + LLaVA-34b workload on a single Machine. The Fly Machine preset name is disclosed verbatim ("the a100-40gb Fly Machine preset") and pinned as the GPU shape the 34b-parameter LLaVA vision-language model runs on. Canonical wiki datum for the three-stage cold-start budget (concepts/gpu-scale-to-zero-cold-start) on this SKU: a few seconds Machine boot + several tens of seconds to load the model into GPU RAM + several seconds for first-response → ~45 seconds end-to-end cold start. Warm per-response latency "several seconds". Illustrates A100 40G PCIe as a single-Machine inference workhorse even for mid-30B-parameter multimodal models — no cluster required, no NVLink exploited, a scale-to-zero workload with proxy-managed lifecycle. (Source: sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-description)
Instacart Intent Engine — A100 as the 300 ms-miss baseline. Instacart's LoRA-fine-tuned Llama-3-8B SRL model's out-of-box latency on A100 was ~700 ms — more than 2× its production target of 300 ms. Hitting the target required both LoRA adapter merging and an upgrade to H100. Useful datum on the A100-as-inference-ceiling for ~8B-parameter interactive models under an aggressive latency SLO. (Source: sources/2025-11-13-instacart-building-the-intent-engine)

Why it matters¶

Training default. For distributed training, the A100 SXM's NVLink / NVSwitch fabric is the architectural feature that makes tensor/pipeline parallelism tractable — the same property H100 SXM inherits.
MIG as a fractional-GPU primitive. The A100 introduced MIG — 7 isolated slices per physical GPU, each with its own memory / SMs / caches — enabling multi-tenancy on a shared card. Fly.io's attempt to surface MIG to customers via Firecracker micro-VMs over IOMMU PCI passthrough is a negative-space wiki datum: tried and abandoned.
Over-provisioned for most inference. The Fly.io customer-usage revelation is that most inference workloads don't need A100-class compute; an A10 or L40S at a fraction of the cost is capable enough. The A100 remains the fit for training and for a minority of heavyweight inference workloads where its HBM capacity / interconnect genuinely pays.

systems/nvidia-h100 — successor frontier part (Hopper).
systems/nvidia-l40s — Ada Lovelace inference alternative that claims A100-class AI compute at a gaming-GPU cost basis.
systems/nvidia-a10 — the inference volume leader in the Fly.io dataset; a tier below the A100.
systems/nvidia-mig — A100-introduced fractional-GPU primitive; Fly.io tried and abandoned exposing this.
systems/nvlink — intra-node interconnect that defines the SXM variant's training-cluster shape.
systems/fly-machines — whole-A100 attachment path.
concepts/training-serving-boundary — A100 sits on the training side of the historical split.

NVIDIA A100¶

Seen in (wiki)¶

Why it matters¶

Related¶