Skip to content

CONCEPT Cited by 1 source

Concentration risk on single-cloud LLM serving

Definition

Concentration risk on single-cloud LLM serving is the structural reliability failure mode of running production LLM-powered features against a single cloud provider's serving substrate. A single-cloud-wide blip — model deprecation, regional outage, shared-pool saturation, billing or auth control-plane failure — can take down all LLM-dependent features simultaneously because they have no independent fallback.

The wiki canonical framing comes from Slack's 2026-05-28 multi-cloud retrospective (Source: sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud):

"Concentration Risk: Relying too heavily on a single provider's on-demand pool meant that any service-wide blip could have the potential to impact entire Slack AI features simultaneously."

And later, the structural argument:

"As Slack AI scaled to millions of users, we realized that true enterprise-grade reliability and a 'best-of-breed' model strategy required looking beyond any single provider. […] no matter how many failovers we engineered within a single cloud, we remained susceptible to any potential provider-wide outage."

Structural diagnosis

Concentration risk has three load-bearing components:

  1. Shared fate across features — every LLM-powered feature that calls the same provider shares any provider-wide failure mode. Customer-visible damage is multiplied by feature count.
  2. Failover within one cloud is insufficient — Slack's Phase 3 model fallback hierarchy (one model → another model on the same provider) addresses model-level degradation but not provider-wide outages.
  3. OD shared-pool variability adds a second risk layer — shared on-demand pools introduce uptime variability that doesn't exist on dedicated PT capacity. "Service Level Variability: Unlike the dedicated nature of PT, OD operates on a shared-resource model, which typically carries different uptime characteristics."

Why intra-cloud failover is insufficient

Slack's Phase 3 architecture had model fallback hierarchies within Bedrock — when a primary model degraded, traffic rerouted to a designated backup model on the same provider. This mitigates model-level issues (one model throttles, latency spikes, quality regression) but does not mitigate:

  • Provider control-plane outages (auth, billing, IAM, region-wide failures).
  • Provider-wide quota saturation during industry-wide spikes.
  • Provider-imposed model deprecations / changes.
  • Provider-level shared-pool failures.

The structural answer is provider-level redundancy — multi-cloud LLM serving with cross-provider failover routing. See concepts/multi-cloud-llm-serving.

Composition with neighbouring concepts

Concept Relationship
concepts/multi-cloud-llm-serving The architectural resolution to single-cloud concentration risk.
concepts/single-point-of-failure (general) LLM-serving-specialised case.
concepts/availability-multiplication-of-dependencies The mathematical framing — single-provider availability becomes a hard ceiling on every dependent feature's availability.
concepts/provisioned-throughput-vs-on-demand-llm Concentration risk is specifically more severe on OD (shared pool) than on PT (dedicated capacity); the trade-off compounds.
concepts/critical-path-dependency-minimization Sibling at the cross-cloud altitude: avoid critical-path dependency on any single cloud's control or data plane.

Distinguishing from neighbouring failure modes

  • Single-region risk — addressed by multi-region within one cloud; doesn't address provider-level outages.
  • Single-model risk — addressed by model fallback hierarchy within one cloud's catalogue; doesn't address provider-wide failures.
  • Provider-imposed deprecation — sub-case of concentration risk; provider unilaterally deprecates a model without customer-controlled migration timeline.
  • Industry-wide spike — concentration risk variant where the provider's shared pool is saturated by other customers' traffic, not by the customer's own.

When concentration risk is acceptable

  • Pre-launch / low-stakes workloads — internal tooling, experimental features where downtime is tolerable.
  • Lock-in is contractually compensated — provider offers capacity guarantees / SLAs that compensate for the concentration risk economically.
  • No production frontier-model dependencies — workloads run on stable mid-tier models where provider-level availability is high enough.

When concentration risk forces multi-cloud

Slack's verbatim framing for the Phase 3 → Phase 4 trigger:

"As Slack AI scaled to millions of users, we realized that true enterprise-grade reliability and a 'best-of-breed' model strategy required looking beyond any single provider."

The two combined drivers:

  • Scale — millions of users multiply the customer-visible damage of any provider-wide outage.
  • Best-of-breed model strategy — vendor-exclusive frontier models force multi-cloud for capability reasons even before reliability reasons; concentration risk is then a bonus driver in the same direction.

Seen in

  • sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud — canonical wiki disclosure of concentration risk on single-cloud LLM serving as the explicit reliability driver of Slack's Phase 3 → Phase 4 expansion to multi-cloud. Verbatim "any service-wide blip could have the potential to impact entire Slack AI features simultaneously" + "no matter how many failovers we engineered within a single cloud, we remained susceptible to any potential provider-wide outage."
Last updated · 542 distilled / 1,571 read