Figma — The Search for Speed in Figma (OpenSearch)¶
Summary¶
Figma's search team spent several months debugging and re-tuning the search path after upgrading from Elasticsearch to AWS managed OpenSearch Service in late 2023. The headline symptom: the DataDog-reported "average search" was 8 ms while the service's p99 was almost a second — a 120× gap that turned out to be a metric-granularity mismatch (per-shard vs coordinator-level timing). Reconciling observability first, then attacking pre/post-processing, connection-pool starvation, and shard sizing produced ~60% API-latency reduction, ≥50% throughput headroom, and >50% cost cut — with no single magic bullet.
Key takeaways¶
- The "8 ms average" was per-shard, not per-query. OpenSearch's
only true overall-query latency is the
tookfield in the query API response; its metrics/logs track only per-shard time. With up to 500 per-shard queries fanning out from a coordinator node, the coordinator-visible latency was ~150 ms avg / 200–400 ms p99. Reading the wrong metric masked the real bottleneck for months. (This is the canonical concepts/metric-granularity-mismatch instance.) - Pre- and post-processing ate more time than the search itself. <30% of total API time was spent waiting on OpenSearch. Pre- processing built a permissions-aware filter clause; post- processing re-checked permissions on each result. Reordering the permission-evaluation sequence (statistical analysis) and disabling Ruby's intrusive run-time type-safety checks in the permission path gave substantial speedups.
- Thread-local connection pool starvation manifested as multi-tens-of-ms DB latency. Slow-trace analysis found DB queries taking tens of ms even though the DB and its LB proxy were fast. Root cause: the connection pool was too small, so new threads paid expensive per-thread connect/teardown on every query. Fix sped up search and everything else at Figma and retroactively made parallel-DB-read experiments viable — previous "threads rarely help" results had been measuring the pool bug, not the parallelism ceiling.
- Their queries were fine; their index data and shard count weren't. The OpenSearch query profiler showed filters already eliminated the vast majority of docs per shard; the retained docs were a few hundred per shard per query, so the query optimizer was doing its job. But the raw index was bloated: a 50% trim then an additional 90% trim of unused fields with no measurable relevancy impact — the win came from fitting the live set into the OS disk cache, which made all performance more predictable.
- Fewer, larger shards beat the AWS guideline for latency- sensitive workloads. AWS's sizing advice (keep shards <50 GB, ~1 shard per 1.5 CPUs) is built around log-style throughput workloads. For latency-sensitive document search with aggressive pre-filters, the coordinator's cost of managing fan-out dominates: going 450 → 180 shards (−60%) boosted max query rate ≥50% and decreased P50 latency. This is a pattern: patterns/fewer-larger-shards-for-latency.
- Right-sizing nodes: less CPU, more RAM, half the price. After the earlier wins they were CPU-idle and RAM-pressured, so they moved to nodes with 1/3 CPU + 25% more RAM at ~1/2 the price — slight performance gain at half the cost. (Related: concepts/cache-locality — the extra RAM protected disk-cache hit rate.)
opensearch-benchmarkwas not the right tool; they wrote their own. The vendor benchmark is built for OpenSearch-development regression testing, not for running huge randomized query loads against existing clusters; it also doesn't use the server-sidetookfield, so its measurements are client-contaminated. They wrote a custom Go load generator in an afternoon and got consistent results. Pattern: patterns/custom-benchmarking-harness.- Concurrent segment search and zstd compression were neutral- to-negative. Despite CPU headroom, concurrent-segment search added latency even at low QPS and degraded faster under load. zstd was a wash. Counterintuitive: the default knobs were close to right for this shape of workload.
Numbers¶
- 8 ms — per-shard average latency reported by DataDog's OpenSearch integration.
- 150 ms avg / 200–400 ms p99 / 40 ms min — coordinator-view API latency Figma actually saw. Min > DataDog's "max" → red flag that metric semantics differed.
- 500 — potential per-shard queries fanned out per user query in the initial configuration.
- ~1 s p99 — total search API latency pre-optimization.
- ~30% — share of total API time actually spent inside OpenSearch (i.e. pre+post-processing was the majority).
- 50% + 90% — two successive index-size reductions with no measurable relevancy impact; the second one made the working set fit in OS disk cache.
- 450 → 180 shards (−60%) — shard-count reduction for the main index; ≥50% boost in max QPS before excess latency; P50 also decreased (counterintuitive).
- 1/3 CPU, 25% more RAM, ≈1/2 price/node — the node-type swap, slight perf gain.
- ~60% API latency reduction / ≥50% max QPS increase / >50% total cost reduction — end-to-end impact.
- 139 — number of OpenSearch instance types AWS lists ($0.02–$17/hr) that the team had to choose from.
Architectural shape¶
User → Search API (Ruby)
│
├── pre-processing (permissions → filter clause) ──┐
│ │ bigger
│ │ than
├── OpenSearch coordinator node │ search
│ └── fan-out to ~500 per-shard queries │ itself
│ │
└── post-processing (per-result permission check) ──┘
- OpenSearch coordinator ↔ workers: "query" phase (one query per shard, parallel-ish) + "fetch" phase (coordinator picks winners, re-asks top shards for full docs).
- The only overall query latency emitted by OpenSearch is the
tookfield in the search response — everything else (metrics, logs, DataDog integration) is per-shard. Queueing-theory framing applies: the fan-out width * per-shard-tail-latency dominates the coordinator's end-to-end answer (see concepts/tail-latency-at-scale).
Caveats¶
- Not a migration post. The ES→OpenSearch move happened earlier (late 2023); this post is the post-migration perf debug.
- Ruby-specific wins. The run-time type-safety-check removal in the permission system is a Ruby VM thing; won't translate verbatim.
- Relevancy is orthogonal. Index trimming is reported with no measurable relevancy impact but relevancy evaluation methodology isn't detailed; claim stands inside Figma's eval framework.
- Single-shard numbers are still useful. The 8 ms figure wasn't
wrong, just misread. For capacity planning of fan-out systems,
coordinator-view (
took) is the metric that matters. - Not the AI-search post. Sibling piece (sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma) covers Figma AI Search's vector-search path on the same OpenSearch substrate — this one is about traditional full-text search.
Raw¶
- URL: https://www.figma.com/blog/the-search-for-speed-in-figma-opensearch/
- Raw file:
raw/figma/2026-04-21-the-search-for-speed-in-figma-7548dc56.md