SYSTEM Cited by 3 sources

NVIDIA L40S¶

The NVIDIA L40S is the AI-optimized variant of the L40, which is itself the data-centre edition of the GeForce RTX 4090 gaming GPU — "resembling two 4090s stapled together", per Fly.io's framing. The L40S delivers AI-compute performance "comparable to that of the A100" (Fly.io's summary, with an explicit caveat that F32 vs F16 comparisons differ), while retaining the full rendering pipeline and gaming-card cost base.

Seen in (wiki)¶

Fly.io 2024-08-15 — "Volkswagen GTI" framing. Fly.io cut L40S pricing to $1.25/hour — the same price as the A10 — and made the L40S the default recommendation for inference. Named workloads: Llama 3.1 70B, Flux (Black Forest Labs image-gen), Whisper (ASR), SegAlign (whole-genome alignment), DOOM Eternal (showcasing retained graphics hardware). (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)
Fly.io 2024-09-24 — 64-node hyperparameter-tuning cluster. ElixirConf 2024 keynote demo (recapped by Fly.io): Chris Grainger (Amplified) generates a cluster of 64 L40S Fly Machines, compiles a different BERT variant on each, fine-tunes on the same patent corpus, and streams per-node loss curves back to a Livebook in real time — driven by FLAME + the Nx stack. Cluster terminates on notebook disconnect. Fly.io's platform claim: "start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image" (concepts/seconds-scale-gpu-cluster-boot). (Source: sources/2024-09-24-flyio-ai-gpu-clusters-from-your-laptop-with-livebook)
Fly.io 2025-02-14 — "the L40S customer segment persists." In Fly.io's We Were Wrong About GPUs retrospective, the L40S is named as the one SKU that found a developer-shaped product-market fit in Fly's GPU inventory. "That leaves the L40S customers. There are a bunch of these! We dropped L40S prices last year, not because we were sour on GPUs but because they're the one part we have in our inventory people seem to get a lot of use out of. We're happy with them. But they're just another kind of compute that some apps need; they're not a driver of our core business. They're not the GPU bet paying off." The L40S is the customer base that Fly.io's retrenchment protects — forward investment pauses, but existing workloads and pricing stay. (Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Why it matters¶

An A100-class inference card at a gaming-GPU cost basis. The L40S inherits the RTX 4090's core design (Ada Lovelace), so NVIDIA can manufacture at high volume against a consumer-card BOM while charging an enterprise markup — a structurally different economic shape from the HBM-based A100 / H100.
Data-centre form factor. L40-family is designed for rack power/cooling envelopes, not tower PCs. Higher memory than a 4090, lower TDP, denser pack — the design changes Fly.io enumerates when explaining why a 4090 "sucks in a data center rack".
Kept the rendering hardware. Unlike the A100/H100 (AI-only compute cards), the L40S retains the full rasterisation pipeline. Usable for 3D graphics + video processing workloads that a pure compute card can't serve. Fly.io's pitch that a customer could "build the Stadia that Google couldn't pull off" is a functionality play on this.
No NVLink / NVSwitch. The L40S is PCIe-only — it cannot be ganged into the tightly-coupled multi-GPU training domains that A100 / H100 SXM parts form. That is not a limitation for inference, which is exactly why the L40S works as the inference default while the SXM-class parts remain the training default.

Architectural position (per Fly.io)¶

"Long story short, the L40S is an A100-performer that we can price for A10 customers; the Volkswagen GTI of our lineup." The pricing move is engineered to collapse the choice between "A10 or step up to something bigger" into a single default. Fly.io's broader thesis is that for inference, the load-bearing axis is compute-storage-network locality — GPU + instance RAM + Tigris object storage + Anycast network — not the GPU alone.

systems/nvidia-a10 — price anchor; L40S now at A10 price.
systems/nvidia-a100 — AI-compute baseline the L40S claims parity with.
systems/nvidia-h100 — frontier part; L40S is an explicit downmarket-for-inference alternative.
systems/fly-machines — L40S attaches to a Fly Machine via whole-GPU passthrough.
systems/llama-3-1 — named workload the L40S serves.
concepts/inference-vs-training-workload-shape — why an inference-shaped card (PCIe, graphics-retained, modest interconnect) is the right shape.
patterns/co-located-inference-gpu-and-object-storage — Fly.io's L40S + Tigris architectural pitch.

NVIDIA L40S¶

Seen in (wiki)¶

Why it matters¶

Architectural position (per Fly.io)¶

Related¶