Skip to content

SYSTEM Cited by 4 sources

Amazon OpenSearch Service

Amazon OpenSearch Service is AWS's managed service for OpenSearch (the open-source fork of Elasticsearch/Kibana). Since the addition of the k-NN plugin, OpenSearch is also a first-class vector search engine — nearest-neighbour queries over dense embeddings, with HNSW / IVF / other ANN index types.

OpenSearch Serverless is the scale-to-demand variant that auto- provisions capacity for collections (logical groupings of indices).

This page is a stub created for the cross-reference from S3 Vectors; OpenSearch is a large product family with many features not covered here.

(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)

Role as the hot tier for vectors (2025-07-16)

In the S3 Vectors launch, OpenSearch is the hot counterpart in AWS's cold-to-hot vector tiering story: S3 Vectors for cheap archival storage, OpenSearch Serverless k-NN for high-QPS low-latency search.

"OpenSearch's high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection." (Channy Yun, 2025-07-16)

The S3 console exposes Advanced search export → Export to OpenSearch which creates a new OpenSearch Serverless collection and populates a k-NN index from an S3 vector index. See patterns/cold-to-hot-vector-tiering.

Caveats

  • This page only captures the vector-search role surfaced by the S3 Vectors launch. Full OpenSearch features (full-text search, log analytics, alerting, dashboards, Kibana-compatible visualisations, security analytics) are not covered here.

Production vector-search role (Figma AI Search, 2026)

Figma AI Search runs its entire vector index on Amazon OpenSearch k-NN — chosen because "OpenSearch is already deployed widely across Figma for traditional search features, so it made sense to leverage OpenSearch for embedding search at Figma as well." (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)

Two indexes, one cluster — hybrid lexical + vector

Figma keeps a lexical fuzzy-string index (predating AI-powered search) and a k-NN vector index side by side, queries both simultaneously, and fuses by per-index min-max normalization + exact-match boost + interleave (patterns/hybrid-lexical-vector-interleaving).

Metadata-filterable k-NN

In addition to the embedding, each indexed document stores frame name, containing file ID + name, project, team, and organization. This enables faceted search (filters) combined with vector nearest-neighbour — the same pre-filter / post-filter dynamic discussed in concepts/vector-similarity-search.

Memory was the cost

OpenSearch k-NN keeps vectors in memory for low-latency search, so cluster RAM scales with corpus × dimensionality. Figma named this as the second-biggest cost driver after frame-enumeration-and- thumbnailing, and deployed two mitigations:

  1. Vector quantization (concepts/vector-quantization) — OpenSearch k-NN plugin supports quantization via knn-vector-quantization, compressing embeddings from 4-byte floats at a "small reduction in nearest neighbor search accuracy."
  2. _source slimming. Vectors removed from _source so they aren't stored twice (once in k-NN graph, once in _source) and not returned on search responses. See patterns/source-field-slimming-with-external-refetch.

Two OpenSearch bugs candidly reported by Figma

Figma documented two kNN bugs hit at production scale:

  1. Segment-replication replica non-determinism. Periodic non-determinism in end-to-end search tests. Root cause: replica queries returned different results than primary queries, tied to a Reader cannot be cast to class SegmentReader error in the delete path affecting replicas on clusters using segment replication. After joint investigation with the AWS OpenSearch team, fix shipped in upstream k-NN plugin as PR #1808.
  2. _source update-path footgun. Because OpenSearch uses _source to diff-and-rewrite updated documents, removing the embedding from _source (for the storage optimisation) caused updates to unrelated fields (e.g. file name) to silently wipe the embedding off the re-indexed document. Figma's fix: re-fetch the embedding from DynamoDB on every update and include it in the update body, preserving the _source-slim optimisation on the read path. See patterns/source-field-slimming-with-external-refetch.

Both are good case studies of large customers surfacing class-of- issue bugs upstream rather than work-around-forever.

Performance tuning for traditional (full-text) search (Figma, 2026)

A sibling Figma post, "The Search for Speed in Figma" (2026-04-21), documents a months-long perf-debug on the non-AI OpenSearch search path (the substrate predating Figma AI Search). The highlights land on several OpenSearch-operational gotchas worth recording on this page:

Coordinator vs per-shard metrics

OpenSearch does not emit overall-query latency as a metric or log field — its reported "average latency" (e.g. in the DataDog native integration) is per-shard, between coordinator and worker nodes. The only overall-query latency is the took field in the query API response body. Figma's DataDog dashboard reported an 8 ms "average search" while their wrapped client saw 150 ms avg / 200–400 ms p99; the 120× gap was because up to ~500 per-shard queries fanned out per user query (canonical instance of concepts/metric-granularity-mismatch).

Operational rule: parse took from every response and publish it as your latency metric — do not trust vendor integrations' default "average query latency" for capacity planning.

Query phases

From OpenSearch's own docs:

  • Query phase — coordinator fans out one query per shard to worker nodes; many (not all) in parallel.
  • Fetch phase — coordinator collects per-shard results, picks winners, typically re-asks top shards for full documents.

The coordinator's cost grows with the number of shards even when each shard's work is trivially filtered down.

Shard sizing for latency-sensitive workloads

AWS's published sizing recommendations (shards <50 GB, ~1 shard/1.5 CPUs) are calibrated for throughput-intensive log workloads; Figma's measurements contradicted them for latency-sensitive document search with effective pre-filters. Cutting shards 450 → 180 (−60%) gave ≥50% max-QPS boost and decreased P50 latency (not just p99). Documented as patterns/fewer-larger-shards-for-latency.

Disk-cache residency is the latency floor

Two successive index-size reductions (50%, then additional 90% of unused-field data) had no measurable relevancy impact — the real win was making the live set fit in the OS disk cache. At that point all performance became consistent. Paired with a node-type swap to 1/3 CPU + 25% more RAM at ≈1/2 price — the CPU had been wasted, the RAM was the constraint. General concepts/cache-locality principle applied at the OS-page-cache layer for a search engine.

Benchmark harness

opensearch-benchmark (OpenSearch's own tool) was unsuitable — built for vendor-side regression testing, hard to send huge randomized query loads against an existing cluster, and strangely doesn't use the server-side took field so its latencies are client-contaminated. Figma wrote their own Go load generator in an afternoon → patterns/custom-benchmarking-harness.

Knobs that were ~neutral

Documented for posterity: zstd compression was a wash, concurrent segment search added latency even at low QPS and degraded faster under load on their shape. "There was no single magic bullet."

End-to-end impact

~60% API-latency reduction, ≥50% max-QPS headroom, >50% total cost reduction. (Source: sources/2026-04-21-figma-the-search-for-speed-in-figma)

Seen in

Last updated · 200 distilled / 1,178 read