SYSTEM Cited by 7 sources
Redis¶
Redis is an in-memory data-structure store — key-value on the outside, but with first-class server-side support for lists, hashes, sorted sets, streams, and pub/sub. Persistence is optional (RDB snapshots + append-only log). Typically deployed as a cache, a fast serving tier for precomputed artifacts, a lightweight message broker, or a rate-limiter / counter store. Managed offerings (AWS ElastiCache, Google Memorystore, Redis Cloud) are the common deployment in production.
Properties relevant to system design¶
- Single-threaded command execution on the primary (client-facing) instance — atomicity for single commands; no in-process locking.
- Sub-millisecond in-memory reads when the dataset fits in RAM.
- Replication + cluster sharding for scale; read replicas for read fan-out.
- TTL on keys for cache-with-expiry as a first-class primitive.
- Not a source of truth. Durability is best-effort; treat as a cache / derived read model and keep the authoritative copy elsewhere.
Seen in¶
-
sources/2026-06-02-redpanda-how-omninode-uses-redpanda-to-scale-ai-agent-workflows — 2026-06-02 OmniNode → Redpanda migration disclosure (guest post on Redpanda Blog by founder Jonah Gray). Canonicalises Redis Streams' scale ceiling at the 5 → 12 repos / 100+ event types crossing point: "we outgrew Redis Streams not because of throughput, but because coordination itself became difficult." The trigger was the five Kafka-shaped capabilities Redis Streams doesn't offer at the same fidelity — consumer groups, partition-level parallelism, durable replay semantics, topic introspection, programmatic provisioning. Disclosed migration path:
XADD/XREADGROUPbehind a transport-layer abstraction → Redpanda. "Apache Kafka was explicitly deferred in the roadmap because the system was still small." -
sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference — Voyage AI by MongoDB uses Redis as the queue substrate for token- count-based batching on the query side of embedding inference. Each request is enqueued on a Redis list with an attached
token_count; model servers run an atomic Lua script that pops items from the list until the total token count reaches the model-and-hardware-specific optimal batch size (~600 tokens for voyage-3 on A100), and sets per-item TTLs in the same single atomic call. Redis's single-threaded script execution guarantees no two model-server workers race on the same items. Canonical wiki instance of patterns/atomic-conditional-batch-claim. Caveat: "the probability of Redis losing data is very low. In the rare case that it does happen, users may receive 503 Service Unavailable errors and can simply retry" — Redis chosen specifically for its atomic-peek-and-claim Lua primitive, trading durability for the batching primitive RabbitMQ / Kafka don't natively offer. Enables Voyage AI's 50 % GPU- inference-latency reduction with 3× fewer GPUs on voyage-3-large. - sources/2024-12-10-canva-routing-print-orders — Canva Print Routing stores per-destination-region precomputed routing graphs in ElastiCache/Redis. 6 ms retrieval in most regions, 20 ms for largest; 99.999% availability (with read replicas). The routing graphs are async-rebuilt from a relational source of truth, so a Redis outage can be recovered from without data loss — the authority lives in the relational store.
- sources/2026-04-21-figma-figcache-next-generation-data-caching-platform — Figma FigCache fronts a fleet of ElastiCache Redis clusters with an in-house RESP-wire-protocol proxy. Context: at Figma scale, Redis evolved from a non-critical component into a critical-path dependency and its connection limits became load-bearing. Rapid client-fleet scale-ups triggered thundering herds of new connections that bottlenecked Redis I/O and degraded availability. Also: Redis Cluster's
CROSSSLOTerror on multi-key pipelines across hash slots is an application-visible footgun; FigCache's fanout engine transparently resolves read-only cases as parallel scatter-gather. Post-FigCache rollout, connection counts on Redis clusters dropped by an order of magnitude across the board and became much less volatile despite unchanged diurnal traffic patterns; node failovers / cluster scaling / transient connectivity errors were downgraded from high-sev incidents to zero-downtime background events. Shard failovers now run liberally and frequently across Figma's entire Redis footprint as live resiliency exercises. - sources/2023-07-16-highscalability-gossip-protocol-explained — Redis Cluster named as a canonical production gossip-protocol deployment: "Redis cluster uses the gossip protocol to propagate the node metadata." Redis Cluster's cluster bus is the gossip channel — each node periodically exchanges pings/pongs with random peers carrying slot-ownership, epoch, and failure- state information. Third-party explainer-level citation; useful as the definitional pointer to Redis Cluster's distributed-membership layer for readers coming from the gossip-protocol concept page.
- sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution — fork-aware reference: Lyft's Feature Store names its write-through LRU cache as ValKey, not Redis. Relevant to the Redis page because (a) the post-2024 Redis-Ltd. license change drove the industry fork, and (b) production usage patterns carry over one-to-one between Redis and ValKey at the protocol / data-structure level — Lyft is a data point on the Redis-API surface becoming the stable primitive while the underlying implementation bifurcates.
- sources/2025-07-08-planetscale-caching — Ben Dicken names Redis (alongside Memcached and CloudFront) as the canonical in-memory recent-content cache sitting in front of slow object storage like S3: "These websites store much of their content (email, images, documents, videos, etc) in 'slow' storage (like Amazon S3 or similar), but cache recent content in faster, in-memory stores (like CloudFront, Redis or Memcached)." Canonical wiki positioning of Redis as the application-tier recency-bias cache, complementary to the per-company production deployments covered by the other Seen-in entries (Figma FigCache, Voyage AI batching, Canva print routing, Lyft Feature Store).
- sources/2026-04-28-expedia-expedias-service-telemetry-analyzer — Expedia STAR uses Redis as both Celery broker and result backend for its async RCA workflow queue: "we moved to Celery with Redis acting as the broker and result backend to store the state and results of tasks." Canonical wiki instance of Redis as the request-response-async queue substrate (as opposed to a streaming platform like Kafka — STAR explicitly rejects Kafka because the traffic shape is request-response). Extends Redis's catalogued roles on the wiki (cache, recency-bias store, atomic-claim batching, gossip substrate) with the Celery broker + result backend role for LLM-era async task pipelines.
Failure modes at scale (Figma FigCache retrospective)¶
The FigCache rearchitecture documents Redis failure modes worth naming:
- Connection-volume saturation. Even before reaching Redis's hard connection limit, growing fleet-wide connection counts degrade I/O throughput and increase tail latency.
- Thundering-herd on scale-up. Elastic client fleets open many new TCP+TLS connections simultaneously; the handshake burst bottlenecks Redis for existing clients.
- Client-ecosystem fragmentation. Different client libraries have inconsistent Redis Cluster awareness, retry/timeout behavior, and observability — making fleet-wide guarantees about client- side state correctness during failovers impossible.
The canonical remedy is a stateless proxy tier in front of Redis that performs concepts/connection-multiplexing. See systems/figcache.
Related¶
- aws-elasticache — AWS's managed Redis/Memcached
- systems/figcache — Figma's in-house proxy tier in front of ElastiCache Redis
- systems/celery — Python task queue that uses Redis as the canonical broker + result backend (Expedia STAR instance)
- systems/expedia-star — canonical wiki instance of Redis as Celery broker + result backend for an LLM RCA pipeline
- patterns/caching-proxy-tier — the architectural pattern for fronting Redis at scale
- concepts/connection-multiplexing — the reason to put a proxy there
- concepts/gossip-protocol — Redis Cluster's cluster-bus protocol
- systems/canva-print-routing
- patterns/async-projected-read-model
- patterns/atomic-conditional-batch-claim — Redis + Lua as the canonical native substrate for peek + atomic-claim-up-to-budget batching
- concepts/token-count-based-batching — Voyage AI's application of the pattern to GPU embedding inference
- systems/voyage-ai, systems/vllm — Voyage AI's serving stack consuming batches from Redis