Skip to content

PATTERN Cited by 1 source

Ping/pong keepalive under NLB

Intent

Keep persistent WebSocket connections alive through a Layer-4 load balancer's non-configurable TCP idle timeout by sending periodic WebSocket ping frames at an interval well below the timeout threshold, combined with pong-based liveness detection to quickly identify dead clients.

Problem

NLB TCP passthrough provides no HTTP-level inspection or connection management. It tracks TCP flows by packet activity alone and drops flows that have been idle beyond a fixed threshold (350 seconds on AWS NLB, not configurable). If a WebSocket connection goes idle (no game messages, no chat, no user activity), the NLB silently drops the TCP flow. The server still holds an open socket; the client still thinks it's connected. The next message from either side fails silently or with a reset.

Solution

The server sends a WebSocket ping frame at a regular interval significantly shorter than the idle timeout. The client responds with a pong. This resets the NLB's idle timer on every cycle. A missing pong within a deadline means the client is unreachable — the server closes the connection proactively rather than waiting for the NLB to drop it.

Canonical parameters

Control Value Rationale
Ping interval 10 s Well below 350s idle timeout; 35× safety margin
Pong deadline 20 s Two missed pings = dead client
Session expiry 7200 s Token validated at connect time only
Single socket true New connection kills previous, prevents stale state

(Source: sources/2026-06-29-aws-dual-token-authentication-for-nakama-game-servers)

Key properties

  • Transparent to NLB: the NLB sees TCP packets flowing; it doesn't know or care they're WebSocket pings. Any data packet resets the idle timer.
  • Fast failure detection: 20-second pong deadline means a dead client is detected in at most 30 seconds (10s ping cycle + 20s deadline), not 350s.
  • Bandwidth cost: WebSocket ping/pong frames are 2–6 bytes of payload. At 10s intervals, this is ~0.6 bytes/second per connection — negligible.
  • Generalises beyond WebSocket: any persistent-connection protocol (gRPC keepalive, custom TCP heartbeat, MQTT PINGREQ/PINGRESP) can use the same pattern behind an NLB.

When to use

  • Any real-time server (game, chat, collaboration, IoT) behind an AWS NLB with persistent connections.
  • Any Layer-4 load balancer with a non-configurable or short idle timeout.
  • Servers where idle connections are legitimate (player in lobby, user reading, IoT device sleeping between sensor reads).

Seen in

Last updated · 562 distilled / 1,660 read