SYSTEM Cited by 4 sources
Amazon OpenSearch Service¶
Amazon OpenSearch Service is AWS's managed service for OpenSearch (the open-source fork of Elasticsearch/Kibana). Since the addition of the k-NN plugin, OpenSearch is also a first-class vector search engine — nearest-neighbour queries over dense embeddings, with HNSW / IVF / other ANN index types.
OpenSearch Serverless is the scale-to-demand variant that auto- provisions capacity for collections (logical groupings of indices).
This page is a stub created for the cross-reference from S3 Vectors; OpenSearch is a large product family with many features not covered here.
(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)
Role as the hot tier for vectors (2025-07-16)¶
In the S3 Vectors launch, OpenSearch is the hot counterpart in AWS's cold-to-hot vector tiering story: S3 Vectors for cheap archival storage, OpenSearch Serverless k-NN for high-QPS low-latency search.
"OpenSearch's high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection." (Channy Yun, 2025-07-16)
The S3 console exposes Advanced search export → Export to OpenSearch which creates a new OpenSearch Serverless collection and populates a k-NN index from an S3 vector index. See patterns/cold-to-hot-vector-tiering.
Caveats¶
- This page only captures the vector-search role surfaced by the S3 Vectors launch. Full OpenSearch features (full-text search, log analytics, alerting, dashboards, Kibana-compatible visualisations, security analytics) are not covered here.
Production vector-search role (Figma AI Search, 2026)¶
Figma AI Search runs its entire vector index on Amazon OpenSearch k-NN — chosen because "OpenSearch is already deployed widely across Figma for traditional search features, so it made sense to leverage OpenSearch for embedding search at Figma as well." (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)
Two indexes, one cluster — hybrid lexical + vector¶
Figma keeps a lexical fuzzy-string index (predating AI-powered search) and a k-NN vector index side by side, queries both simultaneously, and fuses by per-index min-max normalization + exact-match boost + interleave (patterns/hybrid-lexical-vector-interleaving).
Metadata-filterable k-NN¶
In addition to the embedding, each indexed document stores frame name, containing file ID + name, project, team, and organization. This enables faceted search (filters) combined with vector nearest-neighbour — the same pre-filter / post-filter dynamic discussed in concepts/vector-similarity-search.
Memory was the cost¶
OpenSearch k-NN keeps vectors in memory for low-latency search, so cluster RAM scales with corpus × dimensionality. Figma named this as the second-biggest cost driver after frame-enumeration-and- thumbnailing, and deployed two mitigations:
- Vector quantization (concepts/vector-quantization) —
OpenSearch k-NN plugin supports quantization via
knn-vector-quantization, compressing embeddings from 4-byte floats at a "small reduction in nearest neighbor search accuracy." _sourceslimming. Vectors removed from_sourceso they aren't stored twice (once in k-NN graph, once in_source) and not returned on search responses. See patterns/source-field-slimming-with-external-refetch.
Two OpenSearch bugs candidly reported by Figma¶
Figma documented two kNN bugs hit at production scale:
- Segment-replication replica non-determinism. Periodic
non-determinism in end-to-end search tests. Root cause: replica
queries returned different results than primary queries, tied to
a
Reader cannot be cast to class SegmentReadererror in the delete path affecting replicas on clusters using segment replication. After joint investigation with the AWS OpenSearch team, fix shipped in upstream k-NN plugin as PR #1808. _sourceupdate-path footgun. Because OpenSearch uses_sourceto diff-and-rewrite updated documents, removing the embedding from_source(for the storage optimisation) caused updates to unrelated fields (e.g. file name) to silently wipe the embedding off the re-indexed document. Figma's fix: re-fetch the embedding from DynamoDB on every update and include it in the update body, preserving the_source-slim optimisation on the read path. See patterns/source-field-slimming-with-external-refetch.
Both are good case studies of large customers surfacing class-of- issue bugs upstream rather than work-around-forever.
Performance tuning for traditional (full-text) search (Figma, 2026)¶
A sibling Figma post, "The Search for Speed in Figma" (2026-04-21), documents a months-long perf-debug on the non-AI OpenSearch search path (the substrate predating Figma AI Search). The highlights land on several OpenSearch-operational gotchas worth recording on this page:
Coordinator vs per-shard metrics¶
OpenSearch does not emit overall-query latency as a metric or
log field — its reported "average latency" (e.g. in the DataDog
native integration) is per-shard, between coordinator and
worker nodes. The only overall-query latency is the took
field in the query API response body. Figma's DataDog dashboard
reported an 8 ms "average search" while their wrapped client saw
150 ms avg / 200–400 ms p99; the 120× gap was because up to ~500
per-shard queries fanned out per user query (canonical instance of
concepts/metric-granularity-mismatch).
Operational rule: parse took from every response and publish
it as your latency metric — do not trust vendor integrations'
default "average query latency" for capacity planning.
Query phases¶
From OpenSearch's own docs:
- Query phase — coordinator fans out one query per shard to worker nodes; many (not all) in parallel.
- Fetch phase — coordinator collects per-shard results, picks winners, typically re-asks top shards for full documents.
The coordinator's cost grows with the number of shards even when each shard's work is trivially filtered down.
Shard sizing for latency-sensitive workloads¶
AWS's published sizing recommendations (shards <50 GB, ~1 shard/1.5 CPUs) are calibrated for throughput-intensive log workloads; Figma's measurements contradicted them for latency-sensitive document search with effective pre-filters. Cutting shards 450 → 180 (−60%) gave ≥50% max-QPS boost and decreased P50 latency (not just p99). Documented as patterns/fewer-larger-shards-for-latency.
Disk-cache residency is the latency floor¶
Two successive index-size reductions (50%, then additional 90% of unused-field data) had no measurable relevancy impact — the real win was making the live set fit in the OS disk cache. At that point all performance became consistent. Paired with a node-type swap to 1/3 CPU + 25% more RAM at ≈1/2 price — the CPU had been wasted, the RAM was the constraint. General concepts/cache-locality principle applied at the OS-page-cache layer for a search engine.
Benchmark harness¶
opensearch-benchmark (OpenSearch's own tool) was unsuitable —
built for vendor-side regression testing, hard to send huge
randomized query loads against an existing cluster, and strangely
doesn't use the server-side took field so its latencies are
client-contaminated. Figma wrote their own Go load generator in
an afternoon → patterns/custom-benchmarking-harness.
Knobs that were ~neutral¶
Documented for posterity: zstd compression was a wash, concurrent segment search added latency even at low QPS and degraded faster under load on their shape. "There was no single magic bullet."
End-to-end impact¶
~60% API-latency reduction, ≥50% max-QPS headroom, >50% total cost reduction. (Source: sources/2026-04-21-figma-the-search-for-speed-in-figma)
Seen in¶
- sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — positioned as the DRAM/SSD hot tier for vectors, destination of the S3 Vectors export flow; named as the right tier for "product recommendations or fraud detection."
- sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma
— full-production vector-search substrate for
Figma AI Search over billions of
entries; two-index hybrid (lexical fuzzy-match + k-NN); vector
quantization +
_sourceslimming; two candidly-reported bugs (segment-replication replica non-determinism fixed in k-NN PR #1808;_sourceupdate-path wipes embedding, fix via DynamoDB re-fetch). - sources/2026-04-21-figma-the-search-for-speed-in-figma —
perf-tuning retrospective on Figma's traditional full-text
OpenSearch path (post-migration from Elasticsearch, late 2023).
Documents: the coordinator-vs-per-shard metric-granularity
gotcha,
tookas the only overall-latency field, shard-count reduction 450→180, disk-cache-residency index trimming, cheaper RAM-heavy node mix, and a custom Go benchmark harness replacing opensearch-benchmark. - sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications — OpenSearch Serverless as hot-tier vector store for telemetry RAG. k-NN index of embedded logs / events / metrics streaming in via Fluent Bit → Kinesis → Lambda + Titan Embeddings v2. At query time the chatbot embeds the user's natural-language question, runs k-NN, and injects the matched telemetry snippets into the prompt. Serverless was picked explicitly to "avoid the overhead of managing infrastructure and can focus on the troubleshooting workflow itself." Canonical wiki reference for patterns/telemetry-to-rag-pipeline.