Skip to content

PATTERN Cited by 1 source

Session affinity for MCP SSE

Shape

When a multitenant MCP server fleet accepts long-lived Server-Sent-Events (SSE) connections from LLM clients, the routing tier must guarantee that every SSE connection from a given client lands on the same stateful MCP-server instance — the one holding that client's session state.

    Client A (LLM) ── SSE stream 1 ─┐
    Client A (LLM) ── SSE stream 2 ─┤──▶  MCP-server instance X  (holds A's state)
    Client B (LLM) ── SSE stream 1 ──┼──▶  MCP-server instance Y  (holds B's state)
    Client B (LLM) ── SSE stream 2 ──┘

Without the affinity guarantee, a given client's subsequent connections may hit an instance without the state, forcing either session reconstitution (expensive, often impossible) or cross-instance state sharing (expensive at the store tier).

Why long-lived SSE changes the routing contract

Classic HTTP request/response MCP could fan out across any instance; each request was self-contained. Modern MCP flows (long-lived SSE) carry the session on the connection. Once the first SSE connection lands on instance X, instance X owns the session's in-memory state: tool-registration, subscription list, in-flight tool-invocation state, LLM-client context.

Any routing decision that sends a subsequent connection from the same client to a different instance breaks the contract.

Implementation

Tenant-controlled routing

Fly.io's specific answer is tenant-controlled dynamic request routing on Fly Proxy. The tenant's MCP server app can specify, per-connection, which Fly Machine should receive the request — based on header, path, or other routable attribute. The tenant's MCP server is free to hash, look up, or otherwise route the client to a specific Machine; Fly Proxy honours the decision.

Canonical Fly.io framing:

More recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance. So we think it's possible that the control we give over request routing is a robot attractant.

(Source: sources/2025-04-08-flyio-our-best-customers-are-now-robots)

Header-based affinity

A common shape is: the MCP server identifies the client session on first connection, assigns it to an instance, and returns an affinity cookie / header. Subsequent connections include the header; the routing tier hashes it to the same instance.

Connection-level affinity

Some routing tiers pin at the TCP / TLS connection level — once a connection is established, all HTTP requests on that connection stay on the same backend. This is sufficient if the MCP client reuses connections; it's not sufficient if the MCP client opens fresh connections per SSE stream.

Persistent-instance-per-client

A structural alternative: give each client its own server-instance address, so affinity is structural rather than a routing-tier decision. This is the concepts/one-to-one-agent-instance|1:1 agent-to-instance shape from Cloudflare's Agents SDK backed by Durable Objects. Each agent is a DO; the DO address is the affinity key.

Three styles of "solve this problem"

Style Routing primitive State Cost
Tenant-driven dynamic routing (Fly.io) Header / path / cookie → specific Fly Machine Per-instance in-memory Tenant chooses
1:1 agent-to-DO (Cloudflare) Agent ID = DO address Per-DO, structural Per-agent storage
Shared-store + any-instance (classic) Round-robin External store Hot path through shared store

Fly.io's shape keeps the state cheap (in-memory per Machine) at the cost of routing complexity. Cloudflare's shape structurally eliminates the routing decision at the cost of a DO per agent. The classic shared-store shape is simplest for the routing tier but most expensive on every hot path.

Open questions

  • Failover / rebalancing. When the instance holding a client's session dies, the routing tier has to detect it and re-route. How cleanly this is handled is deployment- specific; Fly's post doesn't engage with it.
  • Connection migration. Can an SSE stream migrate to a different instance mid-flight without the client noticing? Generally no (the server state would have to migrate too), so clients typically reconnect.
  • Long-idle drops. Intermediate proxies (LBs, firewalls) may kill idle SSE connections. Affinity only matters if the connection stays up; a client that reconnects is fine as long as the reconnect hashes back to the same instance.
  • Multi-connection sessions. Some MCP flows open multiple SSE connections per session. Affinity has to map all of them to the same instance, typically via a session-ID-carrying header or cookie on each.

Known uses

  • Fly.io (2025-04-08) — tenant-controlled dynamic request routing on Fly Proxy; the canonical wiki instance for MCP SSE specifically.
  • Cloudflare Agents SDK — structural 1:1 via concepts/one-to-one-agent-instance; alternate shape for the same requirement.
Last updated · 200 distilled / 1,178 read