SYSTEM Cited by 3 sources
gRPC¶
gRPC is Google-originated (now CNCF) high-performance RPC framework built on HTTP/2 with Protocol Buffers as the default serialization. The key design choice relevant to system design: long-lived, multiplexed HTTP/2 connections — many concurrent requests flow over one TCP connection instead of opening/closing per call.
Why gRPC breaks L4 load balancers¶
HTTP/2's long-lived connection is a correctness/perf win for the protocol (no per-request TCP + TLS handshake) but a problem for Layer 4 load balancers (systems/kube-proxy):
- L4 LB picks a backend once per TCP connection.
- gRPC keeps that connection alive for many requests.
- → All requests from that client go to the same backend.
- → Traffic skew: across a fleet, some pods hot-spot while others idle.
The fix is to move load-balancing decisions up to Layer 7, per-request: - concepts/client-side-load-balancing — the client library picks a backend per call. - systems/envoy / service-mesh sidecar — an L7 proxy parses HTTP/2 frames and picks per request.
gRPC itself has a built-in load-balancing API (xds: resolver, round_robin, etc.) that lets the client library subscribe to an xDS control plane and do per-request routing. That's the mechanism Databricks uses via their Armeria-based Scala client on top of a custom xDS server (systems/databricks-endpoint-discovery-service).
Seen in¶
- sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — gRPC over HTTP/2 is explicitly the workload that motivated Databricks to build client-side L7 load balancing, because kube-proxy's per-connection selection caused persistent traffic skew.
- sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Dropbox extended its service-discovery system to speak xDS (per gRPC A27) so Robinhood can drive gRPC clients too. Since gRPC upstream doesn't support endpoint-weight-aware weighted round-robin as of the post's date, Dropbox wrote a custom weighted-RR picker based on earliest-deadline-first scheduling. Generalizable note: EDF yields weighted-RR with lower variance than ticket-bucket approaches when the client library needs to implement its own picker.
- sources/2026-04-08-aws-build-a-multi-tenant-configuration-system-with-tagged-storage-patterns — gRPC used in two distinct roles in the multi-tenant config architecture: (1) internal Order Service → Config Service call path (picked over REST for "high-performance, type-safe communication ... where compatibility with web browsers isn't a requirement"); (2) the refresh RPC endpoint on every Config Service instance, called by the invalidator Lambda to push fresh configuration into in-memory caches (the "zero-downtime configuration updates with gRPC streaming" cited as a headline design decision). Streaming-capable transport makes per-instance push-invalidation tractable without polling.