PATTERN Cited by 1 source
Multi-cloud LLM serving¶
Pattern¶
Run production LLM-powered features against managed model-serving endpoints from two or more independent cloud providers, fronted by an in-house abstraction layer that unifies API shape, error codes, rate-limiting, telemetry, and authentication, and that routes requests based on metric-driven model selection, real-time health signals, and per-workload optimisation criteria.
The canonical wiki implementation: Slack's Intelligent Routing Layer spanning AWS Bedrock + GCP Vertex AI, reached via a three-year four-phase evolution (early 2023 → early 2026). (Source: sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud)
When to use it¶
- Production LLM workloads at scale — millions of users multiply the customer-visible impact of any single-cloud outage.
- Vendor-exclusive frontier models — when the state-of-the-art for a specific feature is fragmented across providers' catalogues.
- High-stakes reliability requirements — internal failover within one cloud is insufficient against provider-wide disruption.
- Per-feature optimisation matters — different features have different cost / latency / quality / reasoning profiles, and the optimal model differs per feature.
- Compliance allows cross-cloud routing — legal / FedRAMP / sovereignty constraints don't pin you to one provider.
When NOT to use it¶
- Single-cloud is sufficient — if all required models are on one cloud at the right SLAs and per-feature optimisation isn't worth the operational tax.
- Engineering bandwidth is constrained — the four named taxes (API normalisation, monitoring, attribution, on-call expertise) require sustained investment.
- Compliance pins you to one provider — federal contracts or regulated workloads with single-cloud sovereignty requirements.
- No vendor-exclusivity pressure — the model frontier is catalogued on one cloud and your roadmap is comfortable there.
Five structural pieces¶
┌─────────────────────────────────────────────┐
│ Application features (Slack AI suite) │
└─────────────────────┬───────────────────────┘
│ unified internal API
┌─────────────────────▼───────────────────────┐
│ Intelligent Routing Layer │
│ │
│ 1. Metric-driven model selection │
│ (primary + designated backup per feat) │
│ │
│ 2. Experimental rules / A-B testing │
│ (% traffic shaping, in-prod evals) │
│ │
│ 3. Automated circuit breaker │
│ + partial-open recovery state │
│ (TTFT, p90 latency, 5xx error rate) │
│ │
│ 4. API normalization layer │
│ (errors, rate-limits, telemetry, auth) │
│ │
│ 5. Secretless cross-cloud authentication │
└────────┬───────────────┬────────────────────┘
│ │
┌────────▼─────┐ ┌─────▼─────────┐
│ AWS Bedrock │ │ GCP Vertex AI │
│ (PT + OD) │ │ (multi- │
│ │ │ provider) │
└──────────────┘ └───────────────┘
The pattern requires:
- Abstraction layer with unified internal contract.
- Per-feature model bindings with primary + backup.
- Health-driven circuit breaker for endpoint-level degradation response.
- API normalisation for cross-provider error / rate-limit / telemetry uniformity.
- Secretless cross-cloud authentication plumbing.
How to evolve to it (four phases)¶
Slack's three-year arc canonicalises the migration trajectory:
Phase 1 — Single cloud, escrow VPC¶
Hosted Anthropic models in escrow VPC on AWS SageMaker. Multi-region within one cloud; ODCR + cron-based scaling. Exposes model feature lag when the provider prioritises a different launchpad (Bedrock).
Phase 2 — Migrate to provider's primary launchpad¶
Move to fully managed Amazon Bedrock with Provisioned Throughput. Eliminates feature lag for that provider. Use the zero-incident LLM migration playbook (compliance / capacity / quality / rollout).
Phase 3 — Hybrid PT + OD with spillover¶
Add On-Demand for bursty workloads (patterns/provisioned-throughput-with-on-demand-spillover). Build internal model fallback hierarchy on the same provider. Exposes concentration risk when single-provider failover is insufficient.
Phase 4 — Multi-cloud expansion¶
Add a second cloud (GCP Vertex AI in Slack's case). Build the Intelligent Routing Layer with API normalisation, cross-cloud auth, model-to- feature binding, and the partial-open circuit breaker. Disclosed outcome: ~10% quality lift on complex reasoning + ~67% latency reduction on high-velocity / low-token workloads.
Trade-offs¶
| Compared to… | Wins | Loses |
|---|---|---|
| Single-cloud LLM serving | Provider redundancy + best-of-breed model access + per-feature optimisation | API normalisation overhead + cross-cloud monitoring complexity + cost attribution complexity + on-call knowledge breadth |
| Multi-region single-cloud | Provider-level redundancy beyond regional outages | Same operational taxes; multi-region is cheaper if provider outages are rare |
| Multi-model single-cloud | Cross-provider model exclusivity coverage | Same operational taxes; multi-model alone doesn't address provider-wide outages |
| Self-hosted multi-cloud | Maximum flexibility | Loses managed-service operational savings; weights distribution / GPU procurement / scaling all become customer's problem |
Operational taxes (Slack disclosed)¶
- API and behavioural friction — addressed by API normalisation layer.
- Operational monitoring complexity — unified dashboard pulling per-cloud telemetry.
- The attribution challenge — per-feature cost tracking when traffic shifts dynamically.
- The on-call knowledge gap — engineers can't be single-cloud specialists.
Composition with other patterns¶
- patterns/api-normalization-layer-cross-provider — the abstraction enabler.
- patterns/model-fallback-hierarchy-with-circuit-breaker — the resilience pattern composed at routing-layer altitude.
- patterns/provisioned-throughput-with-on-demand-spillover — composes inside one cloud as the cost / predictability trade-off; multi-cloud composes outside.
- patterns/zero-incident-llm-migration — applies to each cross-cloud and cross-substrate migration step.
Reflections (Slack's five takeaways verbatim)¶
- "Scaling safely requires XFN parity" — Legal / Risk / Compliance / Security alignment with Engineering as the actual unblocker.
- "The abstraction layer is a core requirement" — agility and speed to market are the competitive edge; the routing layer dominates the model choice.
- "Treat architecture as a living document" — provider- agnostic routing lets you adopt breakthroughs without a rewrite.
- "Reliability requires provider agnosticism" — internal failovers within one cloud aren't enough.
- "Redefining the meaning of 'Failure'" — soft failures (p90 spikes, feedback trends) are first-class triggers; an "LLM service that is 'up' but slow is effectively broken".
Risks and mitigations¶
- API normalisation drift → provider releases a new error / API and the layer stops normalising. Mitigation: per- provider integration tests + provider-API change monitoring.
- Cost attribution gaps → multi-cloud billing complexity hides per-feature cost. Mitigation: deep instrumentation across billing systems.
- Cross-cloud auth credential leak → secretless auth reduces but doesn't eliminate. Mitigation: short-lived tokens + auditable federation flows.
- Model selection latency / cost overhead → routing layer becomes a hot path bottleneck. Mitigation: per-feature routing decisions cached; only health signals drive re-evaluation.
- Compliance drift — multi-cloud expands the data-residency / privacy attack surface. Mitigation: per-cloud regional data boundaries codified as routing constraints.
Seen in¶
- sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud — canonical wiki disclosure of the multi-cloud LLM serving pattern as the architectural endpoint of Slack's three-year Slack AI evolution. Production substrate for millions of users on AWS Bedrock + GCP Vertex AI behind the Intelligent Routing Layer. Disclosed Phase 4 outcomes: ~10% quality lift on complex reasoning, ~67% latency reduction on high-velocity workloads.
Related¶
- concepts/multi-cloud-llm-serving
- concepts/concentration-risk-single-cloud-llm
- concepts/model-to-feature-binding
- concepts/api-normalization-multi-cloud-llm
- concepts/automated-circuit-breaker-with-partial-open-state
- concepts/llm-model-feature-lag
- systems/slack-ai
- systems/slack-intelligent-routing-layer
- systems/amazon-bedrock
- systems/gcp-vertex-ai
- patterns/api-normalization-layer-cross-provider
- patterns/model-fallback-hierarchy-with-circuit-breaker
- patterns/provisioned-throughput-with-on-demand-spillover
- patterns/zero-incident-llm-migration