CONCEPT Cited by 1 source

Insurgent cloud constraints¶

Definition¶

The set of structural disadvantages that a non-hyperscaler cloud (an "insurgent cloud" — Fly.io, Render, Railway, Vultr, Hetzner in one frame; PlanetScale, Tigris, Cloudflare-vs-AWS in other frames) faces when trying to compete with hyperscalers (AWS, GCP, Azure) and — separately — when trying to compete with frontier-model API providers (OpenAI, Anthropic) on the LLM-serving axis. Different constraint classes; same framing.

Canonical wiki statement¶

Fly.io, 2025-02-14, diagnosing why the GPU bet didn't pay off:

For those developers, who probably make up most of the market, it doesn't seem plausible for an insurgent public cloud to compete with OpenAI and Anthropic. Their APIs are fast enough, and developers thinking about performance in terms of "tokens per second" aren't counting milliseconds.

(Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

The constraint classes¶

1. Foundation-model access¶

Closed frontier models are not licensable. OpenAI and Anthropic do not let an insurgent cloud host their frontier models. The most valuable inference shapes (GPT-5, Claude Opus) cannot be served by the insurgent.
Open-weight models are available but lag. Llama, Mistral, DeepSeek, Kimi K2.5 are all hostable — but the demand-side audience (see concepts/developers-want-llms-not-gpus) has concentrated on closed frontier APIs for most "AI-enable my app" use cases.

2. Hyperscaler gravity¶

Adjacent managed services. AWS Bedrock, Azure AI Foundry, GCP Vertex — each hyperscaler has a first-party LLM-serving surface adjacent to the developer's existing compute/storage. Moving inference to an insurgent forces the developer to cross an identity / network / billing boundary per request.
Egress as a shaping force. Pulling gigabytes of model weights from S3 to an off-hyperscaler GPU bills the developer on both sides of the boundary. See concepts/egress-cost.
Credit / volume discount gravity. Enterprise AWS credits attach to AWS compute. Taking workloads off AWS forfeits the spend commitment.

3. Hardware-supply constraints¶

Nvidia allocations skew hyperscaler. Frontier GPU supply is not evenly distributed; hyperscalers get priority allocations of H100 / Blackwell SKUs. Insurgents get what's left. Fly.io: "people doing serious AI work … want an SXM cluster of H100s" — reachable by a hyperscaler, not by Fly.
Vendor-driver happy-paths favor hyperscaler-shaped configurations. See concepts/nvidia-driver-happy-path. Nvidia's driver team QAs against K8s / VMware / QEMU. Micro-VM-shaped insurgents burn integration cost.

4. Developer-mindshare gravity¶

Tokens-per-second is the speed axis developers care about, not milliseconds. An insurgent cloud's compute-storage-network locality pitch is invisible when the bottleneck is the model's token generation rate, not network round-trip.
"Fast-enough" APIs foreclose the latency-advantage pitch. If OpenAI / Anthropic APIs are fast-enough, the insurgent's latency-advantage has no market to sell into.

5. Scale-of-investment asymmetry¶

Hyperscaler capex is order-of-magnitude larger. Building a competitive frontier-training fleet is tens of billions of dollars. Insurgents can't match it. This cuts off the "insurgent becomes an OpenAI" path.
Security-audit cost is fixed. An insurgent funding two external audits (Atredis + Tetrel) absorbs the cost as a meaningful fraction of engineering spend; a hyperscaler absorbs it as noise.

Where insurgents can win¶

Fly.io's own product thesis enumerates the remaining positions:

Co-located inference + object storage for the workloads that do care about compute-storage-network locality (not most, per 2025-02-14). L40S customers persist.
Developer-experience shapes hyperscalers don't ship. ms-boot VMs, disposable VMs for agentic loops, per-region edge scheduling, simple pricing.
Opinionated platforms. The hyperscaler's strength (every primitive, configurable) is also its weakness (the developer has to assemble them). An opinionated developer platform can out-deliver on the workload it's opinionated for.
Partner-managed native bindings. See patterns/partner-managed-service-as-native-binding — combining smaller specialist providers into one developer- platform-feel offering is viable when hyperscalers haven't packaged equivalent developer-UX.

Implications¶

Don't try to win the API tier against frontier providers. The capex, the model-license moats, and the developer mindshare are all on the wrong side of the comparison.
Don't confuse serious-AI-customer demand with developer-shaped demand. Serious-AI customers exist; they buy hyperscaler SXM capacity; they are not the insurgent's audience.
Asset-backed bets bound the downside — see concepts/asset-backed-bet. Fly.io's GPU inventory is liquidatable even if the GPU product isn't. That makes the bet structurally cheaper to be wrong about.
Course-correct without abandoning customers. See patterns/platform-retrenchment-without-customer-abandonment.

Caveats¶

The constraint set is frame-specific. "Insurgent cloud" is used differently in different framings (vs hyperscaler-IaaS; vs hyperscaler-PaaS; vs frontier-model APIs; vs GPU-specialist clouds like RunPod / Replicate). Which constraints bind depends on which frame you're in.
Not a permanent state. Frontier-model licensing could open up (see DeepSeek-class open-weight competitive pressure); hyperscaler egress-fee structure could change; insurgent-cloud-native features (fast boot, edge, simple pricing) could concentrate enough demand to shift balance.
"Fly.io-shaped" insurgent vs other shapes. A GPU- specialist insurgent (Replicate, RunPod, Lambda Labs) has different constraints than a general-compute insurgent. They out-perform Fly.io on raw GPU density and price; they under-perform on edge networking and developer- platform surface.

Seen in (wiki)¶

sources/2025-02-14-flyio-we-were-wrong-about-gpus — canonical Fly.io statement of the constraint shape for GPU inference.

concepts/developers-want-llms-not-gpus — the demand-side half of the constraint story.
concepts/product-market-fit — the constraint class is what makes PMF hard on some axes.
concepts/inference-compute-storage-network-locality — the architectural thesis the constraints contain.
concepts/egress-cost — one of the constraints.
patterns/co-located-inference-gpu-and-object-storage — the pattern the constraints don't foreclose, but also don't ignite market demand for.
systems/fly-machines / systems/nvidia-l40s — the remaining-addressable-market instances.
companies/flyio — canonical wiki source.