CONCEPT Cited by 1 source
Concentration risk on single-cloud LLM serving¶
Definition¶
Concentration risk on single-cloud LLM serving is the structural reliability failure mode of running production LLM-powered features against a single cloud provider's serving substrate. A single-cloud-wide blip — model deprecation, regional outage, shared-pool saturation, billing or auth control-plane failure — can take down all LLM-dependent features simultaneously because they have no independent fallback.
The wiki canonical framing comes from Slack's 2026-05-28 multi-cloud retrospective (Source: sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud):
"Concentration Risk: Relying too heavily on a single provider's on-demand pool meant that any service-wide blip could have the potential to impact entire Slack AI features simultaneously."
And later, the structural argument:
"As Slack AI scaled to millions of users, we realized that true enterprise-grade reliability and a 'best-of-breed' model strategy required looking beyond any single provider. […] no matter how many failovers we engineered within a single cloud, we remained susceptible to any potential provider-wide outage."
Structural diagnosis¶
Concentration risk has three load-bearing components:
- Shared fate across features — every LLM-powered feature that calls the same provider shares any provider-wide failure mode. Customer-visible damage is multiplied by feature count.
- Failover within one cloud is insufficient — Slack's Phase 3 model fallback hierarchy (one model → another model on the same provider) addresses model-level degradation but not provider-wide outages.
- OD shared-pool variability adds a second risk layer — shared on-demand pools introduce uptime variability that doesn't exist on dedicated PT capacity. "Service Level Variability: Unlike the dedicated nature of PT, OD operates on a shared-resource model, which typically carries different uptime characteristics."
Why intra-cloud failover is insufficient¶
Slack's Phase 3 architecture had model fallback hierarchies within Bedrock — when a primary model degraded, traffic rerouted to a designated backup model on the same provider. This mitigates model-level issues (one model throttles, latency spikes, quality regression) but does not mitigate:
- Provider control-plane outages (auth, billing, IAM, region-wide failures).
- Provider-wide quota saturation during industry-wide spikes.
- Provider-imposed model deprecations / changes.
- Provider-level shared-pool failures.
The structural answer is provider-level redundancy — multi-cloud LLM serving with cross-provider failover routing. See concepts/multi-cloud-llm-serving.
Composition with neighbouring concepts¶
| Concept | Relationship |
|---|---|
| concepts/multi-cloud-llm-serving | The architectural resolution to single-cloud concentration risk. |
| concepts/single-point-of-failure (general) | LLM-serving-specialised case. |
| concepts/availability-multiplication-of-dependencies | The mathematical framing — single-provider availability becomes a hard ceiling on every dependent feature's availability. |
| concepts/provisioned-throughput-vs-on-demand-llm | Concentration risk is specifically more severe on OD (shared pool) than on PT (dedicated capacity); the trade-off compounds. |
| concepts/critical-path-dependency-minimization | Sibling at the cross-cloud altitude: avoid critical-path dependency on any single cloud's control or data plane. |
Distinguishing from neighbouring failure modes¶
- Single-region risk — addressed by multi-region within one cloud; doesn't address provider-level outages.
- Single-model risk — addressed by model fallback hierarchy within one cloud's catalogue; doesn't address provider-wide failures.
- Provider-imposed deprecation — sub-case of concentration risk; provider unilaterally deprecates a model without customer-controlled migration timeline.
- Industry-wide spike — concentration risk variant where the provider's shared pool is saturated by other customers' traffic, not by the customer's own.
When concentration risk is acceptable¶
- Pre-launch / low-stakes workloads — internal tooling, experimental features where downtime is tolerable.
- Lock-in is contractually compensated — provider offers capacity guarantees / SLAs that compensate for the concentration risk economically.
- No production frontier-model dependencies — workloads run on stable mid-tier models where provider-level availability is high enough.
When concentration risk forces multi-cloud¶
Slack's verbatim framing for the Phase 3 → Phase 4 trigger:
"As Slack AI scaled to millions of users, we realized that true enterprise-grade reliability and a 'best-of-breed' model strategy required looking beyond any single provider."
The two combined drivers:
- Scale — millions of users multiply the customer-visible damage of any provider-wide outage.
- Best-of-breed model strategy — vendor-exclusive frontier models force multi-cloud for capability reasons even before reliability reasons; concentration risk is then a bonus driver in the same direction.
Seen in¶
- sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud — canonical wiki disclosure of concentration risk on single-cloud LLM serving as the explicit reliability driver of Slack's Phase 3 → Phase 4 expansion to multi-cloud. Verbatim "any service-wide blip could have the potential to impact entire Slack AI features simultaneously" + "no matter how many failovers we engineered within a single cloud, we remained susceptible to any potential provider-wide outage."