PATTERN Cited by 1 source
Ping/pong keepalive under NLB¶
Intent¶
Keep persistent WebSocket connections alive through a Layer-4 load balancer's non-configurable TCP idle timeout by sending periodic WebSocket ping frames at an interval well below the timeout threshold, combined with pong-based liveness detection to quickly identify dead clients.
Problem¶
NLB TCP passthrough provides no HTTP-level inspection or connection management. It tracks TCP flows by packet activity alone and drops flows that have been idle beyond a fixed threshold (350 seconds on AWS NLB, not configurable). If a WebSocket connection goes idle (no game messages, no chat, no user activity), the NLB silently drops the TCP flow. The server still holds an open socket; the client still thinks it's connected. The next message from either side fails silently or with a reset.
Solution¶
The server sends a WebSocket ping frame at a regular interval significantly shorter than the idle timeout. The client responds with a pong. This resets the NLB's idle timer on every cycle. A missing pong within a deadline means the client is unreachable — the server closes the connection proactively rather than waiting for the NLB to drop it.
Canonical parameters¶
| Control | Value | Rationale |
|---|---|---|
| Ping interval | 10 s | Well below 350s idle timeout; 35× safety margin |
| Pong deadline | 20 s | Two missed pings = dead client |
| Session expiry | 7200 s | Token validated at connect time only |
| Single socket | true | New connection kills previous, prevents stale state |
(Source: sources/2026-06-29-aws-dual-token-authentication-for-nakama-game-servers)
Key properties¶
- Transparent to NLB: the NLB sees TCP packets flowing; it doesn't know or care they're WebSocket pings. Any data packet resets the idle timer.
- Fast failure detection: 20-second pong deadline means a dead client is detected in at most 30 seconds (10s ping cycle + 20s deadline), not 350s.
- Bandwidth cost: WebSocket ping/pong frames are 2–6 bytes of payload. At 10s intervals, this is ~0.6 bytes/second per connection — negligible.
- Generalises beyond WebSocket: any persistent-connection protocol (gRPC keepalive, custom TCP heartbeat, MQTT PINGREQ/PINGRESP) can use the same pattern behind an NLB.
When to use¶
- Any real-time server (game, chat, collaboration, IoT) behind an AWS NLB with persistent connections.
- Any Layer-4 load balancer with a non-configurable or short idle timeout.
- Servers where idle connections are legitimate (player in lobby, user reading, IoT device sleeping between sensor reads).
Seen in¶
- sources/2026-06-29-aws-dual-token-authentication-for-nakama-game-servers — Nakama 10s ping / 20s pong / 350s NLB timeout