Figma — The Infrastructure Behind AI Search in Figma¶
Summary¶
Infrastructure companion to Figma's earlier product-narrative post on
AI-powered search. Where the product post (see
sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma)
framed why search and what to index, this post documents how the
pipeline is actually built: embeddings from open-source
CLIP (a multimodal model producing
image and text embeddings into the same vector space, so a
string query and an image query can hit the same index);
systems/dynamodb for frame metadata + embeddings KV store;
systems/aws-s3 for thumbnail storage;
AWS SageMaker for batched
embedding inference; and systems/amazon-opensearch-service k-NN
plugin for the actual vector-search index with metadata-filter-
capable faceted search. The indexing pipeline is decomposed into four
discrete queued jobs (identify-and-thumbnail → generate-embeddings →
persist-to-OpenSearch → stop) so batching and retry behaviour can be
tuned per stage
(patterns/pipeline-stage-as-discrete-job). Identifying indexable
frames required a headless server-side C++ build of the Figma
editor in an async job. Two cost optimizations dominated: (1)
Ruby → C++ rewrite of the frame-enumeration-and-thumbnailing path,
eliminating the intermediate full-file-JSON serialization that was
"extremely slow and memory intensive"; (2) GPU → CPU software
rendering via llvmpipe on newer-generation instance types —
cheaper per-instance and faster per workload. Freshness was debounced
to 4h (matches patterns/edit-quiescence-indexing), quantified
here: "we would only have to process 12% of the data." Corpus
further cut in half by removing draft files, within-file duplicates,
and unmodified file copies. Query path is
hybrid
lexical+vector — two independent OpenSearch indexes queried
simultaneously, scores min-max normalised, exact-lexical matches
boosted, results interleaved. Vector quantization (OpenSearch
k-NN) compresses embeddings in-index trading a small recall hit for
memory. Two OpenSearch-internal bugs are reported candidly: (1)
periodic non-determinism on segment-replication replicas from a
Reader cannot be cast to SegmentReader in the delete path — fixed
upstream in
k-NN PR #1808;
(2) a _source slimming footgun — vectors removed from _source
for size, but OpenSearch's update path diffs against _source to
rewrite each document, so updates to unrelated fields silently wiped
the embedding. Fix: re-fetch embeddings from DynamoDB on update
(patterns/source-field-slimming-with-external-refetch). The
indexing scope motivator: at Figma's scale "a small percentage of
users" requires indexing most of the fleet because teams are small and
numerous — rollouts converge on full-corpus indexing quickly.
Key takeaways¶
-
Multimodal embeddings collapse text-query and image-query to one index. Figma uses OpenAI's open-source CLIP model. CLIP is multimodal: text and images embed into the same vector space so the embedding for the string
"cat"is close to the embedding for an image of a cat. Net architectural effect — one vector index serves both query modes (screenshot, selection, text). The post also notes Figma initially experimented with embedding a textual JSON representation of the selection; image embeddings produced better results and crucially share the code path with screenshot queries, so JSON-route was dropped. (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma) -
Pipeline-stage-as-discrete-job, explicit rationale. The indexing path is decomposed into four discrete queued jobs rather than one long job. Figma's stated reason: "separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior." Each stage queues the next on successful termination. Canonical patterns/pipeline-stage-as-discrete-job.
-
Indexable-frame enumeration requires running the editor headless server-side. "Unpublished frames within a Figma file are not readily enumerable" — so Figma runs a headless server-side C++ build of the Figma editor in an async job to walk the file and emit indexable-frame identity + thumbnail. This is an unusually heavy step, but necessary because the file format's frame tree is opaque without the editor's runtime.
-
Ruby → C++ rewrite of the enumeration + thumbnail path. First implementation serialized the entire Figma file as JSON and parsed that in Ruby — "extremely slow and memory intensive." Rewriting in C++ eliminated both the serialization and the Ruby parse: "huge runtime improvements and memory reductions." This is the thumbnailing step, not the embedding-model step — cheap compute turned out to dominate the cost structure, not inference.
-
GPU → CPU rendering via
llvmpipewas a big cost saver. Thumbnail rendering moved off GPU on older AWS instance types and onto CPU-based software rendering viallvmpipeon newer instance types. Two wins named: (a) CPU instances are cheaper; (b) newer instances are faster, so the workload completes in less time. Inverts the usual "GPU is always cheaper per rendered frame" intuition — for bulk thumbnail workloads the instance-generation economics flipped. -
Edit-quiescence debounce: 4h → process 12% of the data. Figma's edit-quiescence policy (patterns/edit-quiescence-indexing) is quantified here for the first time. "If we debounced indexing to be at most every four hours, we would only have to process 12% of the data" — an ~8× load reduction just from the quiescence window. Diurnal cluster autoscaling further trimmed off-peak compute.
-
Corpus halved by excluding drafts / within-file duplicates / unmodified copies. OpenSearch memory was the second-biggest cost driver. Figma "took a deeper look at what we considered indexable" — removing draft files, duplicate designs within files, and files that were copies without new changes cut the index in half. These exclusions were also product-UX wins (patterns/selective-indexing-heuristics — the same pattern Figma already documents, here with the "cut in half" quantification).
-
Vector quantization in OpenSearch k-NN — recall vs memory trade. By default OpenSearch k-NN stores each embedding dimension as a 4-byte float. Vector quantization compresses the representation at the cost of a "small reduction in nearest neighbor search accuracy." Figma applies this to shrink the index footprint in memory. Canonical concepts/vector-quantization.
-
Hybrid lexical + vector retrieval with min-max score normalization and exact-match boost. Lexical search (fuzzy string match over component names and descriptions) predates AI-powered search and was "safely" preserved. Queries run simultaneously against both indexes. Because raw scores aren't comparable across independent OpenSearch indexes, Figma rescore each result set via min-max normalization, boosts exact lexical matches, then interleaves by updated scores. Canonical patterns/hybrid-lexical-vector-interleaving. Worked example (Figma):
"mouse"returns both the icon specifically titled "Mouse" and cursor-adjacent icons. -
_sourceslimming + vector removal → bug: updates wipe the embedding. To save storage and query latency, Figma removed embedding vectors from OpenSearch's_source(the returned-on- response original document). This interacts with OpenSearch's update path:_sourceis the input OpenSearch uses to reconstruct the updated document, so an update to file name silently dropped the embedding from the reindexed doc. Fix: on update, re-fetch the embedding from DynamoDB and re-inject it, preserving the_sourceslimming optimisation on the read path. Canonical patterns/source-field-slimming-with-external-refetch. -
Segment-replication replica non-determinism bug — upstream fixed. Periodic non-determinism in end-to-end search tests traced to replica queries returning different results than primary queries. Root cause:
Reader cannot be cast to class SegmentReadererror in the delete path, affecting replicas on clusters using segment replication. Partnered with the AWS OpenSearch team; fix shipped in k-NN PR #1808. Clean example of production-scale consumer surfacing an upstream OSS bug, and of the "search works differently depending which replica answered your query" class of failures. -
The "small percentage of users forces full-fleet indexing" observation. Indexing cost matters more because of a workload shape: to give any user full search, their whole team must be indexed. Figma's teams are small and many, so "with even a small percentage of users onboarded, we'd quickly converge on having to index almost all teams at Figma." This means backfill economics, not marginal-onboard economics, dominate the design — motivating every cost optimisation above.
Reported results / numbers¶
- Edit-quiescence debounce = 4h → only 12% of data to process (~8.3× reduction).
- Corpus exclusion set (draft files + in-file duplicates + unmodified file copies) = ~50% index size reduction.
- CPU rendering on newer instances: stated "much cheaper … newer … get through our workload more quickly" — no absolute number.
- Batch-size tuning on SageMaker embedding: past some threshold latency grows linearly with batch size — sub-linear batching benefit has a sweet spot.
- OpenSearch PR #1808 — upstream fix shipped.
- No embedding dimensionality, NDCG, recall, corpus cardinality, or cost dollars disclosed.
Caveats¶
- Product-tier / architectural-tier split. This post is the companion to an earlier product-tier post (sources/2026-04-21-figma-how-we-built-ai-powered-search-in-figma). The product post has the user-research 75% statistic; the infrastructure post has the systems-tier detail. Cross-cite rather than duplicate.
- CLIP is cited by name and arXiv link — "Figma currently uses the open source CLIP model." No fine-tune specifics for the designs-side model disclosed. The components-side model is stated fine-tuned specifically on publicly available Community UI kits. No private Figma files or customer data used for training (stated explicitly).
- Quantization params not disclosed. Figma says "vector quantization" without naming the k-NN mode (scalar int8, PQ, binary) or any recall-impact measurement.
- Index sharding / replica topology not disclosed. The segment- replication bug confirms segment replication is in use, but cluster shape and shard count aren't.
- Hybrid-scoring coefficients not disclosed. "Exact lexical matches receiving a boost" is qualitative; the actual boost factor is not shown.
- "Billions of entries" is framed as the scale motivator but without split by corpus type (frames vs components) or ingest-rate QPS.
llvmpipeon CPU is an unusual choice for design rendering — post doesn't discuss quality differences (anti-aliasing, font- rasterisation) vs GPU rendering. Treat as "good enough for thumbnails that feed the embedding model", not "equivalent to GPU-rendered preview."
Introduces / extends¶
New wiki pages introduced by this source:
- systems/clip-embedding-model — OpenAI's open-source CLIP multimodal text+image embedding model; text and images embed into the same vector space, enabling single-index multimodal retrieval; Figma's canonical production instance.
- concepts/vector-quantization — compression of embedding vectors
at the cost of a small recall hit; OpenSearch k-NN plugin exposes it
via its
knn-vector-quantizationfeature; Figma canonical instance. - patterns/pipeline-stage-as-discrete-job — decomposing an indexing (or any multi-step backend) pipeline into discrete queued jobs so each stage's batching and retry policy can be tuned independently; Figma canonical instance.
- patterns/hybrid-lexical-vector-interleaving — specific hybrid- retrieval tactic of running two independent OpenSearch indexes (lexical + vector), applying min-max score normalization per index, boosting exact lexical matches, and interleaving by updated score; Figma canonical instance; distinct from general hybrid retrieval by its specific score-fusion mechanics.
- patterns/source-field-slimming-with-external-refetch — the
OpenSearch-specific gotcha of removing vectors from
_sourcefor size, breaking update paths, and the fix of re-fetching from an external KV store on update; Figma canonical instance.
Extends:
- systems/figma-ai-search — adds a full "Infrastructure" section
(model = CLIP, storage = DynamoDB + S3 + OpenSearch k-NN, inference =
SageMaker batched, enumeration = headless server-side C++ editor,
rendering =
llvmpipeCPU), the hybrid lexical+vector query path, and the two named OpenSearch bugs. - systems/amazon-opensearch-service — adds k-NN-as-production-
vector-index role, segment-replication replica bug fixed upstream,
_sourceslimming gotcha, vector quantization as the memory mitigator. - systems/aws-sagemaker-endpoint — adds "embedding inference from batched image inputs" role (parallelised image download + resize + normalise inside the container; batch-size sweet spot below latency-linear threshold).
- systems/dynamodb — adds the "metadata + embedding KV store for
an ML-search pipeline; read on OpenSearch update to re-inject slimmed
_sourcefields" role. - systems/aws-s3 — thumbnail-asset bucket for the embedding pipeline.
- concepts/vector-embedding — adds CLIP as canonical multimodal- embedding example (text and image in the same space); adds the "image embedding beats JSON-textual-representation" evaluation result.
- concepts/hybrid-retrieval-bm25-vectors — adds Figma's min-max-normalized + exact-match-boosted + interleaved instance as a second canonical implementation alongside Dash's BM25+vector combo.
- patterns/edit-quiescence-indexing — quantifies the workload- reduction of the 4h debounce: 12% of data.
- patterns/selective-indexing-heuristics — adds the "cut index in half by excluding drafts + in-file dup + unmodified copies" quantification; previously only qualitative.
- companies/figma — new Recent articles entry.
Citation anchors¶
- "Figma currently uses the open source CLIP model, which is what is known as a multimodal embedding model. The model can take multiple forms of inputs (image and text) and output embeddings that are in the same space."
- "Early on in the project, we experimented with generating an embedding from a textual representation of your selection (e.g. JSON), but we found that generating embeddings via images returned better results and ensured we were sharing code paths when searching via a screenshot."
- "Separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior."
- "Unpublished frames within a Figma file are not readily enumerable. To identify them, we run a headless server-side version of the C++ Figma editor in an asynchronous job."
- "Rewriting this logic from Ruby to C++ and eliminating any intermediate serialization yielded huge runtime improvements and memory reductions."
- "We moved our thumbnailing from GPU-based rendering on an older AWS
instance type to CPU-based rendering with
llvmpipeon a newer instance type." - "If we debounced indexing to be at most every four hours, we would only have to process 12% of the data!"
- "Vector quantization is a technique to compress the size of embeddings to reduce the memory required to store and search them, at the cost of a small reduction in nearest neighbor search accuracy."
- "Queries routed to replicas in OpenSearch were returning non-
deterministic results compared to those routed to primaries…
Reader cannot be cast to class SegmentReader… shipped a fix here: https://github.com/opensearch-project/k-NN/pull/1808" - "After removing embeddings from
_source, everytime we tried to update a document … we were accidentally wiping the embedding off of the document … to fix this while retaining our_sourceoptimization, we re-fetch the embeddings from DynamoDB on updates." - "Paradoxically, with even a small percentage of users onboarded, we'd quickly converge on having to index almost all teams at Figma."