Skip to content

Figma — The Infrastructure Behind AI Search in Figma

Summary

Infrastructure companion to Figma's earlier product-narrative post on AI-powered search. Where the product post (see sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma) framed why search and what to index, this post documents how the pipeline is actually built: embeddings from open-source CLIP (a multimodal model producing image and text embeddings into the same vector space, so a string query and an image query can hit the same index); systems/dynamodb for frame metadata + embeddings KV store; systems/aws-s3 for thumbnail storage; AWS SageMaker for batched embedding inference; and systems/amazon-opensearch-service k-NN plugin for the actual vector-search index with metadata-filter- capable faceted search. The indexing pipeline is decomposed into four discrete queued jobs (identify-and-thumbnail → generate-embeddings → persist-to-OpenSearch → stop) so batching and retry behaviour can be tuned per stage (patterns/pipeline-stage-as-discrete-job). Identifying indexable frames required a headless server-side C++ build of the Figma editor in an async job. Two cost optimizations dominated: (1) Ruby → C++ rewrite of the frame-enumeration-and-thumbnailing path, eliminating the intermediate full-file-JSON serialization that was "extremely slow and memory intensive"; (2) GPU → CPU software rendering via llvmpipe on newer-generation instance types — cheaper per-instance and faster per workload. Freshness was debounced to 4h (matches patterns/edit-quiescence-indexing), quantified here: "we would only have to process 12% of the data." Corpus further cut in half by removing draft files, within-file duplicates, and unmodified file copies. Query path is hybrid lexical+vector — two independent OpenSearch indexes queried simultaneously, scores min-max normalised, exact-lexical matches boosted, results interleaved. Vector quantization (OpenSearch k-NN) compresses embeddings in-index trading a small recall hit for memory. Two OpenSearch-internal bugs are reported candidly: (1) periodic non-determinism on segment-replication replicas from a Reader cannot be cast to SegmentReader in the delete path — fixed upstream in k-NN PR #1808; (2) a _source slimming footgun — vectors removed from _source for size, but OpenSearch's update path diffs against _source to rewrite each document, so updates to unrelated fields silently wiped the embedding. Fix: re-fetch embeddings from DynamoDB on update (patterns/source-field-slimming-with-external-refetch). The indexing scope motivator: at Figma's scale "a small percentage of users" requires indexing most of the fleet because teams are small and numerous — rollouts converge on full-corpus indexing quickly.

Key takeaways

  1. Multimodal embeddings collapse text-query and image-query to one index. Figma uses OpenAI's open-source CLIP model. CLIP is multimodal: text and images embed into the same vector space so the embedding for the string "cat" is close to the embedding for an image of a cat. Net architectural effect — one vector index serves both query modes (screenshot, selection, text). The post also notes Figma initially experimented with embedding a textual JSON representation of the selection; image embeddings produced better results and crucially share the code path with screenshot queries, so JSON-route was dropped. (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)

  2. Pipeline-stage-as-discrete-job, explicit rationale. The indexing path is decomposed into four discrete queued jobs rather than one long job. Figma's stated reason: "separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior." Each stage queues the next on successful termination. Canonical patterns/pipeline-stage-as-discrete-job.

  3. Indexable-frame enumeration requires running the editor headless server-side. "Unpublished frames within a Figma file are not readily enumerable" — so Figma runs a headless server-side C++ build of the Figma editor in an async job to walk the file and emit indexable-frame identity + thumbnail. This is an unusually heavy step, but necessary because the file format's frame tree is opaque without the editor's runtime.

  4. Ruby → C++ rewrite of the enumeration + thumbnail path. First implementation serialized the entire Figma file as JSON and parsed that in Ruby — "extremely slow and memory intensive." Rewriting in C++ eliminated both the serialization and the Ruby parse: "huge runtime improvements and memory reductions." This is the thumbnailing step, not the embedding-model step — cheap compute turned out to dominate the cost structure, not inference.

  5. GPU → CPU rendering via llvmpipe was a big cost saver. Thumbnail rendering moved off GPU on older AWS instance types and onto CPU-based software rendering via llvmpipe on newer instance types. Two wins named: (a) CPU instances are cheaper; (b) newer instances are faster, so the workload completes in less time. Inverts the usual "GPU is always cheaper per rendered frame" intuition — for bulk thumbnail workloads the instance-generation economics flipped.

  6. Edit-quiescence debounce: 4h → process 12% of the data. Figma's edit-quiescence policy (patterns/edit-quiescence-indexing) is quantified here for the first time. "If we debounced indexing to be at most every four hours, we would only have to process 12% of the data" — an ~8× load reduction just from the quiescence window. Diurnal cluster autoscaling further trimmed off-peak compute.

  7. Corpus halved by excluding drafts / within-file duplicates / unmodified copies. OpenSearch memory was the second-biggest cost driver. Figma "took a deeper look at what we considered indexable" — removing draft files, duplicate designs within files, and files that were copies without new changes cut the index in half. These exclusions were also product-UX wins (patterns/selective-indexing-heuristics — the same pattern Figma already documents, here with the "cut in half" quantification).

  8. Vector quantization in OpenSearch k-NN — recall vs memory trade. By default OpenSearch k-NN stores each embedding dimension as a 4-byte float. Vector quantization compresses the representation at the cost of a "small reduction in nearest neighbor search accuracy." Figma applies this to shrink the index footprint in memory. Canonical concepts/vector-quantization.

  9. Hybrid lexical + vector retrieval with min-max score normalization and exact-match boost. Lexical search (fuzzy string match over component names and descriptions) predates AI-powered search and was "safely" preserved. Queries run simultaneously against both indexes. Because raw scores aren't comparable across independent OpenSearch indexes, Figma rescore each result set via min-max normalization, boosts exact lexical matches, then interleaves by updated scores. Canonical patterns/hybrid-lexical-vector-interleaving. Worked example (Figma): "mouse" returns both the icon specifically titled "Mouse" and cursor-adjacent icons.

  10. _source slimming + vector removal → bug: updates wipe the embedding. To save storage and query latency, Figma removed embedding vectors from OpenSearch's _source (the returned-on- response original document). This interacts with OpenSearch's update path: _source is the input OpenSearch uses to reconstruct the updated document, so an update to file name silently dropped the embedding from the reindexed doc. Fix: on update, re-fetch the embedding from DynamoDB and re-inject it, preserving the _source slimming optimisation on the read path. Canonical patterns/source-field-slimming-with-external-refetch.

  11. Segment-replication replica non-determinism bug — upstream fixed. Periodic non-determinism in end-to-end search tests traced to replica queries returning different results than primary queries. Root cause: Reader cannot be cast to class SegmentReader error in the delete path, affecting replicas on clusters using segment replication. Partnered with the AWS OpenSearch team; fix shipped in k-NN PR #1808. Clean example of production-scale consumer surfacing an upstream OSS bug, and of the "search works differently depending which replica answered your query" class of failures.

  12. The "small percentage of users forces full-fleet indexing" observation. Indexing cost matters more because of a workload shape: to give any user full search, their whole team must be indexed. Figma's teams are small and many, so "with even a small percentage of users onboarded, we'd quickly converge on having to index almost all teams at Figma." This means backfill economics, not marginal-onboard economics, dominate the design — motivating every cost optimisation above.

Reported results / numbers

  • Edit-quiescence debounce = 4h → only 12% of data to process (~8.3× reduction).
  • Corpus exclusion set (draft files + in-file duplicates + unmodified file copies) = ~50% index size reduction.
  • CPU rendering on newer instances: stated "much cheaper … newer … get through our workload more quickly" — no absolute number.
  • Batch-size tuning on SageMaker embedding: past some threshold latency grows linearly with batch size — sub-linear batching benefit has a sweet spot.
  • OpenSearch PR #1808 — upstream fix shipped.
  • No embedding dimensionality, NDCG, recall, corpus cardinality, or cost dollars disclosed.

Caveats

  • Product-tier / architectural-tier split. This post is the companion to an earlier product-tier post (sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma). The product post has the user-research 75% statistic; the infrastructure post has the systems-tier detail. Cross-cite rather than duplicate.
  • CLIP is cited by name and arXiv link — "Figma currently uses the open source CLIP model." No fine-tune specifics for the designs-side model disclosed. The components-side model is stated fine-tuned specifically on publicly available Community UI kits. No private Figma files or customer data used for training (stated explicitly).
  • Quantization params not disclosed. Figma says "vector quantization" without naming the k-NN mode (scalar int8, PQ, binary) or any recall-impact measurement.
  • Index sharding / replica topology not disclosed. The segment- replication bug confirms segment replication is in use, but cluster shape and shard count aren't.
  • Hybrid-scoring coefficients not disclosed. "Exact lexical matches receiving a boost" is qualitative; the actual boost factor is not shown.
  • "Billions of entries" is framed as the scale motivator but without split by corpus type (frames vs components) or ingest-rate QPS.
  • llvmpipe on CPU is an unusual choice for design rendering — post doesn't discuss quality differences (anti-aliasing, font- rasterisation) vs GPU rendering. Treat as "good enough for thumbnails that feed the embedding model", not "equivalent to GPU-rendered preview."

Introduces / extends

New wiki pages introduced by this source:

  • systems/clip-embedding-model — OpenAI's open-source CLIP multimodal text+image embedding model; text and images embed into the same vector space, enabling single-index multimodal retrieval; Figma's canonical production instance.
  • concepts/vector-quantization — compression of embedding vectors at the cost of a small recall hit; OpenSearch k-NN plugin exposes it via its knn-vector-quantization feature; Figma canonical instance.
  • patterns/pipeline-stage-as-discrete-job — decomposing an indexing (or any multi-step backend) pipeline into discrete queued jobs so each stage's batching and retry policy can be tuned independently; Figma canonical instance.
  • patterns/hybrid-lexical-vector-interleaving — specific hybrid- retrieval tactic of running two independent OpenSearch indexes (lexical + vector), applying min-max score normalization per index, boosting exact lexical matches, and interleaving by updated score; Figma canonical instance; distinct from general hybrid retrieval by its specific score-fusion mechanics.
  • patterns/source-field-slimming-with-external-refetch — the OpenSearch-specific gotcha of removing vectors from _source for size, breaking update paths, and the fix of re-fetching from an external KV store on update; Figma canonical instance.

Extends:

  • systems/figma-ai-search — adds a full "Infrastructure" section (model = CLIP, storage = DynamoDB + S3 + OpenSearch k-NN, inference = SageMaker batched, enumeration = headless server-side C++ editor, rendering = llvmpipe CPU), the hybrid lexical+vector query path, and the two named OpenSearch bugs.
  • systems/amazon-opensearch-service — adds k-NN-as-production- vector-index role, segment-replication replica bug fixed upstream, _source slimming gotcha, vector quantization as the memory mitigator.
  • systems/aws-sagemaker-endpoint — adds "embedding inference from batched image inputs" role (parallelised image download + resize + normalise inside the container; batch-size sweet spot below latency-linear threshold).
  • systems/dynamodb — adds the "metadata + embedding KV store for an ML-search pipeline; read on OpenSearch update to re-inject slimmed _source fields" role.
  • systems/aws-s3 — thumbnail-asset bucket for the embedding pipeline.
  • concepts/vector-embedding — adds CLIP as canonical multimodal- embedding example (text and image in the same space); adds the "image embedding beats JSON-textual-representation" evaluation result.
  • concepts/hybrid-retrieval-bm25-vectors — adds Figma's min-max-normalized + exact-match-boosted + interleaved instance as a second canonical implementation alongside Dash's BM25+vector combo.
  • patterns/edit-quiescence-indexing — quantifies the workload- reduction of the 4h debounce: 12% of data.
  • patterns/selective-indexing-heuristics — adds the "cut index in half by excluding drafts + in-file dup + unmodified copies" quantification; previously only qualitative.
  • companies/figma — new Recent articles entry.

Citation anchors

  • "Figma currently uses the open source CLIP model, which is what is known as a multimodal embedding model. The model can take multiple forms of inputs (image and text) and output embeddings that are in the same space."
  • "Early on in the project, we experimented with generating an embedding from a textual representation of your selection (e.g. JSON), but we found that generating embeddings via images returned better results and ensured we were sharing code paths when searching via a screenshot."
  • "Separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior."
  • "Unpublished frames within a Figma file are not readily enumerable. To identify them, we run a headless server-side version of the C++ Figma editor in an asynchronous job."
  • "Rewriting this logic from Ruby to C++ and eliminating any intermediate serialization yielded huge runtime improvements and memory reductions."
  • "We moved our thumbnailing from GPU-based rendering on an older AWS instance type to CPU-based rendering with llvmpipe on a newer instance type."
  • "If we debounced indexing to be at most every four hours, we would only have to process 12% of the data!"
  • "Vector quantization is a technique to compress the size of embeddings to reduce the memory required to store and search them, at the cost of a small reduction in nearest neighbor search accuracy."
  • "Queries routed to replicas in OpenSearch were returning non- deterministic results compared to those routed to primaries… Reader cannot be cast to class SegmentReader … shipped a fix here: https://github.com/opensearch-project/k-NN/pull/1808"
  • "After removing embeddings from _source, everytime we tried to update a document … we were accidentally wiping the embedding off of the document … to fix this while retaining our _source optimization, we re-fetch the embeddings from DynamoDB on updates."
  • "Paradoxically, with even a small percentage of users onboarded, we'd quickly converge on having to index almost all teams at Figma."

Source

Last updated · 200 distilled / 1,178 read