Figma — The Infrastructure Behind AI Search in Figma¶

Summary¶

Infrastructure companion to Figma's earlier product-narrative post on AI-powered search. Where the product post (see sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma) framed why search and what to index, this post documents how the pipeline is actually built: embeddings from open-source CLIP (a multimodal model producing image and text embeddings into the same vector space, so a string query and an image query can hit the same index); systems/dynamodb for frame metadata + embeddings KV store; systems/aws-s3 for thumbnail storage; AWS SageMaker for batched embedding inference; and systems/amazon-opensearch-service k-NN plugin for the actual vector-search index with metadata-filter- capable faceted search. The indexing pipeline is decomposed into four discrete queued jobs (identify-and-thumbnail → generate-embeddings → persist-to-OpenSearch → stop) so batching and retry behaviour can be tuned per stage (patterns/pipeline-stage-as-discrete-job). Identifying indexable frames required a headless server-side C++ build of the Figma editor in an async job. Two cost optimizations dominated: (1) Ruby → C++ rewrite of the frame-enumeration-and-thumbnailing path, eliminating the intermediate full-file-JSON serialization that was "extremely slow and memory intensive"; (2) GPU → CPU software rendering via llvmpipe on newer-generation instance types — cheaper per-instance and faster per workload. Freshness was debounced to 4h (matches patterns/edit-quiescence-indexing), quantified here: "we would only have to process 12% of the data." Corpus further cut in half by removing draft files, within-file duplicates, and unmodified file copies. Query path is hybrid lexical+vector — two independent OpenSearch indexes queried simultaneously, scores min-max normalised, exact-lexical matches boosted, results interleaved. Vector quantization (OpenSearch k-NN) compresses embeddings in-index trading a small recall hit for memory. Two OpenSearch-internal bugs are reported candidly: (1) periodic non-determinism on segment-replication replicas from a Reader cannot be cast to SegmentReader in the delete path — fixed upstream in k-NN PR #1808; (2) a _source slimming footgun — vectors removed from _source for size, but OpenSearch's update path diffs against _source to rewrite each document, so updates to unrelated fields silently wiped the embedding. Fix: re-fetch embeddings from DynamoDB on update (patterns/source-field-slimming-with-external-refetch). The indexing scope motivator: at Figma's scale "a small percentage of users" requires indexing most of the fleet because teams are small and numerous — rollouts converge on full-corpus indexing quickly.

Key takeaways¶

Multimodal embeddings collapse text-query and image-query to one index. Figma uses OpenAI's open-source CLIP model. CLIP is multimodal: text and images embed into the same vector space so the embedding for the string "cat" is close to the embedding for an image of a cat. Net architectural effect — one vector index serves both query modes (screenshot, selection, text). The post also notes Figma initially experimented with embedding a textual JSON representation of the selection; image embeddings produced better results and crucially share the code path with screenshot queries, so JSON-route was dropped. (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)
Pipeline-stage-as-discrete-job, explicit rationale. The indexing path is decomposed into four discrete queued jobs rather than one long job. Figma's stated reason: "separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior." Each stage queues the next on successful termination. Canonical patterns/pipeline-stage-as-discrete-job.
Indexable-frame enumeration requires running the editor headless server-side. "Unpublished frames within a Figma file are not readily enumerable" — so Figma runs a headless server-side C++ build of the Figma editor in an async job to walk the file and emit indexable-frame identity + thumbnail. This is an unusually heavy step, but necessary because the file format's frame tree is opaque without the editor's runtime.
Ruby → C++ rewrite of the enumeration + thumbnail path. First implementation serialized the entire Figma file as JSON and parsed that in Ruby — "extremely slow and memory intensive." Rewriting in C++ eliminated both the serialization and the Ruby parse: "huge runtime improvements and memory reductions." This is the thumbnailing step, not the embedding-model step — cheap compute turned out to dominate the cost structure, not inference.
GPU → CPU rendering via llvmpipe was a big cost saver. Thumbnail rendering moved off GPU on older AWS instance types and onto CPU-based software rendering via llvmpipe on newer instance types. Two wins named: (a) CPU instances are cheaper; (b) newer instances are faster, so the workload completes in less time. Inverts the usual "GPU is always cheaper per rendered frame" intuition — for bulk thumbnail workloads the instance-generation economics flipped.
Edit-quiescence debounce: 4h → process 12% of the data. Figma's edit-quiescence policy (patterns/edit-quiescence-indexing) is quantified here for the first time. "If we debounced indexing to be at most every four hours, we would only have to process 12% of the data" — an ~8× load reduction just from the quiescence window. Diurnal cluster autoscaling further trimmed off-peak compute.
Corpus halved by excluding drafts / within-file duplicates / unmodified copies. OpenSearch memory was the second-biggest cost driver. Figma "took a deeper look at what we considered indexable" — removing draft files, duplicate designs within files, and files that were copies without new changes cut the index in half. These exclusions were also product-UX wins (patterns/selective-indexing-heuristics — the same pattern Figma already documents, here with the "cut in half" quantification).
Vector quantization in OpenSearch k-NN — recall vs memory trade. By default OpenSearch k-NN stores each embedding dimension as a 4-byte float. Vector quantization compresses the representation at the cost of a "small reduction in nearest neighbor search accuracy." Figma applies this to shrink the index footprint in memory. Canonical concepts/vector-quantization.
Hybrid lexical + vector retrieval with min-max score normalization and exact-match boost. Lexical search (fuzzy string match over component names and descriptions) predates AI-powered search and was "safely" preserved. Queries run simultaneously against both indexes. Because raw scores aren't comparable across independent OpenSearch indexes, Figma rescore each result set via min-max normalization, boosts exact lexical matches, then interleaves by updated scores. Canonical patterns/hybrid-lexical-vector-interleaving. Worked example (Figma): "mouse" returns both the icon specifically titled "Mouse" and cursor-adjacent icons.
_source slimming + vector removal → bug: updates wipe the embedding. To save storage and query latency, Figma removed embedding vectors from OpenSearch's _source (the returned-on- response original document). This interacts with OpenSearch's update path: _source is the input OpenSearch uses to reconstruct the updated document, so an update to file name silently dropped the embedding from the reindexed doc. Fix: on update, re-fetch the embedding from DynamoDB and re-inject it, preserving the _source slimming optimisation on the read path. Canonical patterns/source-field-slimming-with-external-refetch.
Segment-replication replica non-determinism bug — upstream fixed. Periodic non-determinism in end-to-end search tests traced to replica queries returning different results than primary queries. Root cause: Reader cannot be cast to class SegmentReader error in the delete path, affecting replicas on clusters using segment replication. Partnered with the AWS OpenSearch team; fix shipped in k-NN PR #1808. Clean example of production-scale consumer surfacing an upstream OSS bug, and of the "search works differently depending which replica answered your query" class of failures.
The "small percentage of users forces full-fleet indexing" observation. Indexing cost matters more because of a workload shape: to give any user full search, their whole team must be indexed. Figma's teams are small and many, so "with even a small percentage of users onboarded, we'd quickly converge on having to index almost all teams at Figma." This means backfill economics, not marginal-onboard economics, dominate the design — motivating every cost optimisation above.

Reported results / numbers¶

Edit-quiescence debounce = 4h → only 12% of data to process (~8.3× reduction).
Corpus exclusion set (draft files + in-file duplicates + unmodified file copies) = ~50% index size reduction.
CPU rendering on newer instances: stated "much cheaper … newer … get through our workload more quickly" — no absolute number.
Batch-size tuning on SageMaker embedding: past some threshold latency grows linearly with batch size — sub-linear batching benefit has a sweet spot.
OpenSearch PR #1808 — upstream fix shipped.
No embedding dimensionality, NDCG, recall, corpus cardinality, or cost dollars disclosed.

Caveats¶

Product-tier / architectural-tier split. This post is the companion to an earlier product-tier post (sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma). The product post has the user-research 75% statistic; the infrastructure post has the systems-tier detail. Cross-cite rather than duplicate.
CLIP is cited by name and arXiv link — "Figma currently uses the open source CLIP model." No fine-tune specifics for the designs-side model disclosed. The components-side model is stated fine-tuned specifically on publicly available Community UI kits. No private Figma files or customer data used for training (stated explicitly).
Quantization params not disclosed. Figma says "vector quantization" without naming the k-NN mode (scalar int8, PQ, binary) or any recall-impact measurement.
Index sharding / replica topology not disclosed. The segment- replication bug confirms segment replication is in use, but cluster shape and shard count aren't.
Hybrid-scoring coefficients not disclosed. "Exact lexical matches receiving a boost" is qualitative; the actual boost factor is not shown.
"Billions of entries" is framed as the scale motivator but without split by corpus type (frames vs components) or ingest-rate QPS.
llvmpipe on CPU is an unusual choice for design rendering — post doesn't discuss quality differences (anti-aliasing, font- rasterisation) vs GPU rendering. Treat as "good enough for thumbnails that feed the embedding model", not "equivalent to GPU-rendered preview."

Introduces / extends¶

New wiki pages introduced by this source:

systems/clip-embedding-model — OpenAI's open-source CLIP multimodal text+image embedding model; text and images embed into the same vector space, enabling single-index multimodal retrieval; Figma's canonical production instance.
concepts/vector-quantization — compression of embedding vectors at the cost of a small recall hit; OpenSearch k-NN plugin exposes it via its knn-vector-quantization feature; Figma canonical instance.
patterns/pipeline-stage-as-discrete-job — decomposing an indexing (or any multi-step backend) pipeline into discrete queued jobs so each stage's batching and retry policy can be tuned independently; Figma canonical instance.
patterns/hybrid-lexical-vector-interleaving — specific hybrid- retrieval tactic of running two independent OpenSearch indexes (lexical + vector), applying min-max score normalization per index, boosting exact lexical matches, and interleaving by updated score; Figma canonical instance; distinct from general hybrid retrieval by its specific score-fusion mechanics.
patterns/source-field-slimming-with-external-refetch — the OpenSearch-specific gotcha of removing vectors from _source for size, breaking update paths, and the fix of re-fetching from an external KV store on update; Figma canonical instance.

Extends:

systems/figma-ai-search — adds a full "Infrastructure" section (model = CLIP, storage = DynamoDB + S3 + OpenSearch k-NN, inference = SageMaker batched, enumeration = headless server-side C++ editor, rendering = llvmpipe CPU), the hybrid lexical+vector query path, and the two named OpenSearch bugs.
systems/amazon-opensearch-service — adds k-NN-as-production- vector-index role, segment-replication replica bug fixed upstream, _source slimming gotcha, vector quantization as the memory mitigator.
systems/aws-sagemaker-endpoint — adds "embedding inference from batched image inputs" role (parallelised image download + resize + normalise inside the container; batch-size sweet spot below latency-linear threshold).
systems/dynamodb — adds the "metadata + embedding KV store for an ML-search pipeline; read on OpenSearch update to re-inject slimmed _source fields" role.
systems/aws-s3 — thumbnail-asset bucket for the embedding pipeline.
concepts/vector-embedding — adds CLIP as canonical multimodal- embedding example (text and image in the same space); adds the "image embedding beats JSON-textual-representation" evaluation result.
concepts/hybrid-retrieval-bm25-vectors — adds Figma's min-max-normalized + exact-match-boosted + interleaved instance as a second canonical implementation alongside Dash's BM25+vector combo.
patterns/edit-quiescence-indexing — quantifies the workload- reduction of the 4h debounce: 12% of data.
patterns/selective-indexing-heuristics — adds the "cut index in half by excluding drafts + in-file dup + unmodified copies" quantification; previously only qualitative.
companies/figma — new Recent articles entry.

Citation anchors¶

"Figma currently uses the open source CLIP model, which is what is known as a multimodal embedding model. The model can take multiple forms of inputs (image and text) and output embeddings that are in the same space."
"Early on in the project, we experimented with generating an embedding from a textual representation of your selection (e.g. JSON), but we found that generating embeddings via images returned better results and ensured we were sharing code paths when searching via a screenshot."
"Separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior."
"Unpublished frames within a Figma file are not readily enumerable. To identify them, we run a headless server-side version of the C++ Figma editor in an asynchronous job."
"Rewriting this logic from Ruby to C++ and eliminating any intermediate serialization yielded huge runtime improvements and memory reductions."
"We moved our thumbnailing from GPU-based rendering on an older AWS instance type to CPU-based rendering with llvmpipe on a newer instance type."
"If we debounced indexing to be at most every four hours, we would only have to process 12% of the data!"
"Vector quantization is a technique to compress the size of embeddings to reduce the memory required to store and search them, at the cost of a small reduction in nearest neighbor search accuracy."
"Queries routed to replicas in OpenSearch were returning non- deterministic results compared to those routed to primaries… Reader cannot be cast to class SegmentReader … shipped a fix here: https://github.com/opensearch-project/k-NN/pull/1808"
"After removing embeddings from _source, everytime we tried to update a document … we were accidentally wiping the embedding off of the document … to fix this while retaining our _source optimization, we re-fetch the embeddings from DynamoDB on updates."
"Paradoxically, with even a small percentage of users onboarded, we'd quickly converge on having to index almost all teams at Figma."