Skip to content

CONCEPT Cited by 1 source

MCP long-lived SSE

Definition

A routing / runtime property of modern Model Context Protocol (MCP) deployments: MCP flows between an LLM client and a remote MCP server now include repeated, potentially long-lived Server-Sent-Events (SSE) connections rather than one-shot POST-backed request/response calls. In a multitenant MCP deployment, every SSE connection from a given client must be routed to the same stateful MCP-server instance that holds the client's session state — i.e. the deployment needs session affinity.

Canonical wiki statement

Fly.io, 2025-04-08:

To interface with the outside world (because why not) LLMs all speak a protocol called MCP. MCP is what enables the robots to search the web, use a calculator, launch the missiles, shuffle a Spotify playlist, &c.

If you haven't played with MCP, the right way to think about it is POST-back APIs like Twilio and Stripe, where you stand up a server, register it with the API, and wait for the API to connect to you. Complicating things somewhat, more recent MCP flows involve repeated and potentially long-lived (SSE) connections. To make this work in a multitenant environment, you want these connections to hit the same (stateful) instance. So we think it's possible that the control we give over request routing is a robot attractant.

(Source: sources/2025-04-08-flyio-our-best-customers-are-now-robots)

Why SSE

Server-Sent Events is a one-way streaming protocol over HTTP: the client POSTs (or establishes an HTTP GET with Accept: text/event-stream), the server holds the connection open, and the server pushes events as a stream of data: … lines until the client disconnects. Modern MCP uses SSE for two reasons:

  1. Streaming tool invocations — long-running MCP tools (multi-second web searches, multi-step reasoning steps) stream partial results. SSE fits this better than request/ response.
  2. Server-initiated push — MCP supports tools that emit data to the client between explicit requests (e.g. a watchdog tool that reports on subscribed events). SSE is a one-way push channel the server can write to at any time.

The result is connections that stay open for minutes to hours and carry many logical MCP messages.

Why session affinity matters

An MCP server is stateful: it holds the session's tool registry, the list of subscriptions, in-flight tool-invocation state, and often the LLM-client's conversational context. Reconnecting to a different MCP-server instance is not transparent — the state isn't there. A multitenant MCP deployment has two options:

  • Share state across instances. Expensive; requires a shared store; serialises the hot path; punts the multiplexing problem to the store tier.
  • Route connections back to the owning instance. Cheaper; keeps the instance-local state fast. Requires the routing tier to implement session affinity.

Fly.io's framing is that its dynamic request routing — the control a tenant has over which Fly Machine a given connection lands on — is now a platform-level MCP attractant, because it solves the session-affinity requirement at the routing tier without the tenant having to share state.

Contrast with HTTP/1.1 request/response MCP

Early MCP deployments were POST-back style — stand up a server, each tool invocation is one request, each response is one reply, no long-lived connection. Session affinity was not necessary (each request could go to any instance). Modern MCP with SSE breaks that: the session is now carried by the connection, not re-established per request.

Routing-tier primitives required

To serve long-lived SSE MCP at scale, the routing tier needs to support:

  1. Connection-level affinity — once a connection lands on an instance, keep it there; don't re-balance mid-stream.
  2. Per-tenant routing control — the tenant's MCP server fleet is tenant-private; the routing tier has to know to route tenant A's clients to tenant A's fleet.
  3. Graceful handling of long idle — some SSE connections go minutes without traffic between events; the routing tier has to not time them out aggressively.
  4. Backpressure awareness — if the server is slow writing SSE events, the routing tier has to propagate backpressure to the HTTP client cleanly (no head-of-line blocking across tenants).

Fly's claim is that its request-routing surface gets most of these right for reasons that have nothing to do with MCP specifically, and the MCP workload retrofitted onto that surface happens to work.

Adjacent wiki concept

  • concepts/one-to-one-agent-instance — Cloudflare's Agents SDK takes the opposite architectural position: 1:1 agent-to- Durable-Object mapping. Every agent instance is a routable address, so session affinity is structural, not a routing-tier concern. Fly.io achieves the same end via dynamic routing over shared Machines rather than via per-agent DO instances.

What this is not

  • Not a session-stickiness cookie. MCP SSE session affinity is not "the LB reads a cookie" — the binding is at the transport / connection tier, not the HTTP-request tier.
  • Not a protocol-level MCP requirement. The MCP spec does not mandate session affinity; it's a deployment consequence of using SSE in a multitenant environment. Non-multitenant MCP (single-instance) doesn't need it.
  • Not solved by HTTP/2 multiplexing. HTTP/2 multiplexes requests over one TCP connection, but SSE is one stream per logical MCP session. Multiplexing doesn't help if the MCP session-state still sits on one instance.

Seen in

Last updated · 200 distilled / 1,178 read