Skip to content

AWS

Read original ↗

Dual-token authentication for Nakama game servers with Amazon Cognito on AWS

Summary

An AWS Architecture Blog reference architecture for connecting two independent session systems — Amazon Cognito (managed identity, JWT-issuing) and Nakama (open-source game server with its own session tokens) — behind a default-closed multi-layer routing infrastructure. A server-side Go runtime hook validates the Cognito JWT and bridges the verified identity into a Nakama session token. The infrastructure enforces that the only path to the game server is Internet → CloudFront → WAF → ALB or NLB → ECS, with no hop skippable, using security-group chaining and explicit route allowlists. The WebSocket connection lifecycle is managed through a four-control model (ping/pong keepalive, pong timeout, session expiry at connect, single-socket enforcement) that works within the NLB's non-configurable 350-second TCP idle timeout.

Key takeaways

  1. Dual-token pattern: Amazon Cognito owns player identity (short-lived JWT, 1-hour default), Nakama owns game sessions (session token, 2-hour expiry). Neither system depends on the other at runtime after the initial bridge — the Go hook validates the Cognito JWT exactly once and overwrites the Nakama user ID with the verified sub claim.

  2. Default-closed ALB routing: The ALB default listener action returns 403 Forbidden. Only explicitly-listed paths (/healthcheck, /v2/account/authenticate/*, /v2/*, /v1/*) reach the game server. Any unlisted path is rejected at the load-balancer layer before reaching Nakama — limits the attack surface to the declared API.

  3. Dual load-balancer architecture (ALB + NLB): The ALB operates at Layer 7 for HTTP API traffic (path-based routing, explicit allowlist, 403 defaults). The NLB operates at Layer 4 for WebSocket traffic (TCP passthrough, no HTTP inspection). CloudFront routes /ws* to the NLB and everything else to the ALB — each connection type gets appropriate handling behind a single HTTPS endpoint.

  4. Security-group chain: Only CloudFront's managed prefix list can reach the ALB and NLB. The ECS task security group allows inbound only from the ALB and NLB security groups. An additional application-layer check: CloudFront sends a shared secret in an X-CloudFront-Secret header; ALB listener rules reject requests missing the correct value.

  5. JWKS cache with thundering-herd protection: The Go hook caches Cognito's signing keys with a 1-hour TTL. A 30-second re-fetch guard (time.Since(c.fetched) < 30*time.Second) prevents multiple goroutines from simultaneously hitting the JWKS endpoint when the cache expires — a mutex-guarded singleflight pattern for key rotation.

  6. WebSocket lifecycle under NLB TCP passthrough: The NLB drops idle TCP flows after 350 seconds (non-configurable AWS default). Four controls manage this:

  7. Ping interval: 10s — Nakama sends WebSocket ping every 10 seconds, keeping the flow active well within the idle timeout.
  8. Pong wait: 20s — if client doesn't respond, Nakama closes the connection.
  9. Token expiry: 7200s (2 hours) — Nakama validates session token from the token query parameter at WebSocket connect time, rejecting expired tokens before processing game messages.
  10. Single socket: true — a new connection from the same user kills the previous one, preventing split state across stale connections.

  11. JWT validation order: Five checks in sequence — token format (three dot-separated parts), algorithm enforcement (RS256 only), RSA signature verification against JWKS, expiry check, and issuer + audience (client_id) matching. The hook never trusts the identity string sent by the client body — it discards it and overwrites with the verified sub claim.

  12. SRP (Secure Remote Password) authentication: Cognito uses USER_SRP_AUTH flow — the password never leaves the client device. No client secret is needed because the App Client is configured as a public client (generate_secret=false); security comes from the SRP protocol itself.

  13. Infrastructure as Code: Six Terraform modules (network, compute, auth, cdn, waf-cloudfront, ops) with a separate bootstrap module for S3 state backend + KMS key. make deploy builds/pushes the Nakama container to ECR with auto-incrementing tags then runs terraform apply.

  14. Generalisation: The four-layer WebSocket lifecycle model (keepalive, timeout, session expiry at connect, one-connection-per-user) applies to any real-time server behind an NLB TCP passthrough — Colyseus, Photon, custom WebSocket backends, or any server managing persistent connections. If the server lacks built-in ping/pong, application-level heartbeat messages serve the same role.

Operational numbers

Metric Value
NLB TCP idle timeout 350 s (non-configurable)
Nakama ping interval 10 s
Nakama pong wait 20 s
Session token expiry 7200 s (2 hours)
Cognito JWT default expiry 3600 s (1 hour)
JWKS cache TTL 1 hour
JWKS re-fetch guard 30 s

Caveats

  • No production scale numbers: this is a reference architecture with demo Terraform, not a production retrospective. No TPS, latency distribution, or cost numbers disclosed.
  • Token in query parameter: the session token travels as ?token=... in the WebSocket upgrade URL — appears in server access logs, load balancer logs, CloudFront logs, and browser history. Mitigations: TLS in transit, short-lived tokens, single-socket invalidation. The post recommends log-redaction policies.
  • Nakama-specific Go plugin constraint: Nakama's runtime only supports Go plugins (runtime.Initializer interface). The JWT validation logic is portable to any OIDC- compliant server but the plugin mechanism is Nakama-locked.

Source

Last updated · 562 distilled / 1,660 read