FIGMA

Figma — The Search for Speed in Figma (OpenSearch)¶

Summary¶

Figma's search team spent several months debugging and re-tuning the search path after upgrading from Elasticsearch to AWS managed OpenSearch Service in late 2023. The headline symptom: the DataDog-reported "average search" was 8 ms while the service's p99 was almost a second — a 120× gap that turned out to be a metric-granularity mismatch (per-shard vs coordinator-level timing). Reconciling observability first, then attacking pre/post-processing, connection-pool starvation, and shard sizing produced ~60% API-latency reduction, ≥50% throughput headroom, and >50% cost cut — with no single magic bullet.

Key takeaways¶

The "8 ms average" was per-shard, not per-query. OpenSearch's only true overall-query latency is the took field in the query API response; its metrics/logs track only per-shard time. With up to 500 per-shard queries fanning out from a coordinator node, the coordinator-visible latency was ~150 ms avg / 200–400 ms p99. Reading the wrong metric masked the real bottleneck for months. (This is the canonical concepts/metric-granularity-mismatch instance.)
Pre- and post-processing ate more time than the search itself. <30% of total API time was spent waiting on OpenSearch. Pre- processing built a permissions-aware filter clause; post- processing re-checked permissions on each result. Reordering the permission-evaluation sequence (statistical analysis) and disabling Ruby's intrusive run-time type-safety checks in the permission path gave substantial speedups.
Thread-local connection pool starvation manifested as multi-tens-of-ms DB latency. Slow-trace analysis found DB queries taking tens of ms even though the DB and its LB proxy were fast. Root cause: the connection pool was too small, so new threads paid expensive per-thread connect/teardown on every query. Fix sped up search and everything else at Figma and retroactively made parallel-DB-read experiments viable — previous "threads rarely help" results had been measuring the pool bug, not the parallelism ceiling.
Their queries were fine; their index data and shard count weren't. The OpenSearch query profiler showed filters already eliminated the vast majority of docs per shard; the retained docs were a few hundred per shard per query, so the query optimizer was doing its job. But the raw index was bloated: a 50% trim then an additional 90% trim of unused fields with no measurable relevancy impact — the win came from fitting the live set into the OS disk cache, which made all performance more predictable.
Fewer, larger shards beat the AWS guideline for latency- sensitive workloads. AWS's sizing advice (keep shards <50 GB, ~1 shard per 1.5 CPUs) is built around log-style throughput workloads. For latency-sensitive document search with aggressive pre-filters, the coordinator's cost of managing fan-out dominates: going 450 → 180 shards (−60%) boosted max query rate ≥50% and decreased P50 latency. This is a pattern: patterns/fewer-larger-shards-for-latency.
Right-sizing nodes: less CPU, more RAM, half the price. After the earlier wins they were CPU-idle and RAM-pressured, so they moved to nodes with 1/3 CPU + 25% more RAM at ~1/2 the price — slight performance gain at half the cost. (Related: concepts/cache-locality — the extra RAM protected disk-cache hit rate.)
opensearch-benchmark was not the right tool; they wrote their own. The vendor benchmark is built for OpenSearch-development regression testing, not for running huge randomized query loads against existing clusters; it also doesn't use the server-side took field, so its measurements are client-contaminated. They wrote a custom Go load generator in an afternoon and got consistent results. Pattern: patterns/custom-benchmarking-harness.
Concurrent segment search and zstd compression were neutral- to-negative. Despite CPU headroom, concurrent-segment search added latency even at low QPS and degraded faster under load. zstd was a wash. Counterintuitive: the default knobs were close to right for this shape of workload.

Numbers¶

8 ms — per-shard average latency reported by DataDog's OpenSearch integration.
150 ms avg / 200–400 ms p99 / 40 ms min — coordinator-view API latency Figma actually saw. Min > DataDog's "max" → red flag that metric semantics differed.
500 — potential per-shard queries fanned out per user query in the initial configuration.
~1 s p99 — total search API latency pre-optimization.
~30% — share of total API time actually spent inside OpenSearch (i.e. pre+post-processing was the majority).
50% + 90% — two successive index-size reductions with no measurable relevancy impact; the second one made the working set fit in OS disk cache.
450 → 180 shards (−60%) — shard-count reduction for the main index; ≥50% boost in max QPS before excess latency; P50 also decreased (counterintuitive).
1/3 CPU, 25% more RAM, ≈1/2 price/node — the node-type swap, slight perf gain.
~60% API latency reduction / ≥50% max QPS increase / >50% total cost reduction — end-to-end impact.
139 — number of OpenSearch instance types AWS lists ($0.02–$17/hr) that the team had to choose from.

Architectural shape¶

User → Search API (Ruby)
          │
          ├── pre-processing (permissions → filter clause)   ──┐
          │                                                    │ bigger
          │                                                    │  than
          ├── OpenSearch coordinator node                      │ search
          │      └── fan-out to ~500 per-shard queries         │  itself
          │                                                    │
          └── post-processing (per-result permission check)  ──┘

OpenSearch coordinator ↔ workers: "query" phase (one query per shard, parallel-ish) + "fetch" phase (coordinator picks winners, re-asks top shards for full docs).
The only overall query latency emitted by OpenSearch is the took field in the search response — everything else (metrics, logs, DataDog integration) is per-shard. Queueing-theory framing applies: the fan-out width * per-shard-tail-latency dominates the coordinator's end-to-end answer (see concepts/tail-latency-at-scale).

Caveats¶

Not a migration post. The ES→OpenSearch move happened earlier (late 2023); this post is the post-migration perf debug.
Ruby-specific wins. The run-time type-safety-check removal in the permission system is a Ruby VM thing; won't translate verbatim.
Relevancy is orthogonal. Index trimming is reported with no measurable relevancy impact but relevancy evaluation methodology isn't detailed; claim stands inside Figma's eval framework.
Single-shard numbers are still useful. The 8 ms figure wasn't wrong, just misread. For capacity planning of fan-out systems, coordinator-view (took) is the metric that matters.
Not the AI-search post. Sibling piece (sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma) covers Figma AI Search's vector-search path on the same OpenSearch substrate — this one is about traditional full-text search.

Raw¶

URL: https://www.figma.com/blog/the-search-for-speed-in-figma-opensearch/
Raw file: raw/figma/2026-04-21-the-search-for-speed-in-figma-7548dc56.md