PATTERN Cited by 1 source
Session affinity for MCP SSE¶
Shape¶
When a multitenant MCP server fleet accepts long-lived Server-Sent-Events (SSE) connections from LLM clients, the routing tier must guarantee that every SSE connection from a given client lands on the same stateful MCP-server instance — the one holding that client's session state.
Client A (LLM) ── SSE stream 1 ─┐
Client A (LLM) ── SSE stream 2 ─┤──▶ MCP-server instance X (holds A's state)
│
Client B (LLM) ── SSE stream 1 ──┼──▶ MCP-server instance Y (holds B's state)
Client B (LLM) ── SSE stream 2 ──┘
Without the affinity guarantee, a given client's subsequent connections may hit an instance without the state, forcing either session reconstitution (expensive, often impossible) or cross-instance state sharing (expensive at the store tier).
Why long-lived SSE changes the routing contract¶
Classic HTTP request/response MCP could fan out across any instance; each request was self-contained. Modern MCP flows (long-lived SSE) carry the session on the connection. Once the first SSE connection lands on instance X, instance X owns the session's in-memory state: tool-registration, subscription list, in-flight tool-invocation state, LLM-client context.
Any routing decision that sends a subsequent connection from the same client to a different instance breaks the contract.
Implementation¶
Tenant-controlled routing¶
Fly.io's specific answer is tenant-controlled dynamic request routing on Fly Proxy. The tenant's MCP server app can specify, per-connection, which Fly Machine should receive the request — based on header, path, or other routable attribute. The tenant's MCP server is free to hash, look up, or otherwise route the client to a specific Machine; Fly Proxy honours the decision.
Canonical Fly.io framing:
More recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance. So we think it's possible that the control we give over request routing is a robot attractant.
(Source: sources/2025-04-08-flyio-our-best-customers-are-now-robots)
Header-based affinity¶
A common shape is: the MCP server identifies the client session on first connection, assigns it to an instance, and returns an affinity cookie / header. Subsequent connections include the header; the routing tier hashes it to the same instance.
Connection-level affinity¶
Some routing tiers pin at the TCP / TLS connection level — once a connection is established, all HTTP requests on that connection stay on the same backend. This is sufficient if the MCP client reuses connections; it's not sufficient if the MCP client opens fresh connections per SSE stream.
Persistent-instance-per-client¶
A structural alternative: give each client its own server-instance address, so affinity is structural rather than a routing-tier decision. This is the concepts/one-to-one-agent-instance|1:1 agent-to-instance shape from Cloudflare's Agents SDK backed by Durable Objects. Each agent is a DO; the DO address is the affinity key.
Three styles of "solve this problem"¶
| Style | Routing primitive | State | Cost |
|---|---|---|---|
| Tenant-driven dynamic routing (Fly.io) | Header / path / cookie → specific Fly Machine | Per-instance in-memory | Tenant chooses |
| 1:1 agent-to-DO (Cloudflare) | Agent ID = DO address | Per-DO, structural | Per-agent storage |
| Shared-store + any-instance (classic) | Round-robin | External store | Hot path through shared store |
Fly.io's shape keeps the state cheap (in-memory per Machine) at the cost of routing complexity. Cloudflare's shape structurally eliminates the routing decision at the cost of a DO per agent. The classic shared-store shape is simplest for the routing tier but most expensive on every hot path.
Open questions¶
- Failover / rebalancing. When the instance holding a client's session dies, the routing tier has to detect it and re-route. How cleanly this is handled is deployment- specific; Fly's post doesn't engage with it.
- Connection migration. Can an SSE stream migrate to a different instance mid-flight without the client noticing? Generally no (the server state would have to migrate too), so clients typically reconnect.
- Long-idle drops. Intermediate proxies (LBs, firewalls) may kill idle SSE connections. Affinity only matters if the connection stays up; a client that reconnects is fine as long as the reconnect hashes back to the same instance.
- Multi-connection sessions. Some MCP flows open multiple SSE connections per session. Affinity has to map all of them to the same instance, typically via a session-ID-carrying header or cookie on each.
Known uses¶
- Fly.io (2025-04-08) — tenant-controlled dynamic request routing on Fly Proxy; the canonical wiki instance for MCP SSE specifically.
- Cloudflare Agents SDK — structural 1:1 via concepts/one-to-one-agent-instance; alternate shape for the same requirement.
Related¶
- concepts/mcp-long-lived-sse — the routing-contract requirement this pattern satisfies.
- concepts/one-to-one-agent-instance — structural alternative.
- concepts/robot-experience-rx — the product-design axis this pattern is an RX data point on.
- systems/model-context-protocol — the protocol at issue.
- systems/fly-proxy — Fly.io's routing tier.
- systems/fly-machines — the stateful-instance substrate Fly.io pins sessions to.
- patterns/session-affinity-header — related wiki pattern (prior instance: MongoDB predictive autoscaling for prompt-caching-aware routing).
- companies/flyio — canonical wiki source.