SYSTEM Cited by 5 sources

Amazon OpenSearch Service¶

Amazon OpenSearch Service is AWS's managed service for OpenSearch (the open-source fork of Elasticsearch/Kibana). Since the addition of the k-NN plugin, OpenSearch is also a first-class vector search engine — nearest-neighbour queries over dense embeddings, with HNSW / IVF / other ANN index types.

OpenSearch Serverless is the scale-to-demand variant that auto- provisions capacity for collections (logical groupings of indices).

This page is a stub created for the cross-reference from S3 Vectors; OpenSearch is a large product family with many features not covered here.

(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)

Role as the hot tier for vectors (2025-07-16)¶

In the S3 Vectors launch, OpenSearch is the hot counterpart in AWS's cold-to-hot vector tiering story: S3 Vectors for cheap archival storage, OpenSearch Serverless k-NN for high-QPS low-latency search.

"OpenSearch's high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection." (Channy Yun, 2025-07-16)

The S3 console exposes Advanced search export → Export to OpenSearch which creates a new OpenSearch Serverless collection and populates a k-NN index from an S3 vector index. See patterns/cold-to-hot-vector-tiering.

Caveats¶

This page only captures the vector-search role surfaced by the S3 Vectors launch. Full OpenSearch features (full-text search, log analytics, alerting, dashboards, Kibana-compatible visualisations, security analytics) are not covered here.

Production vector-search role (Figma AI Search, 2026)¶

Figma AI Search runs its entire vector index on Amazon OpenSearch k-NN — chosen because "OpenSearch is already deployed widely across Figma for traditional search features, so it made sense to leverage OpenSearch for embedding search at Figma as well." (Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)

Two indexes, one cluster — hybrid lexical + vector¶

Figma keeps a lexical fuzzy-string index (predating AI-powered search) and a k-NN vector index side by side, queries both simultaneously, and fuses by per-index min-max normalization + exact-match boost + interleave (patterns/hybrid-lexical-vector-interleaving).

Metadata-filterable k-NN¶

In addition to the embedding, each indexed document stores frame name, containing file ID + name, project, team, and organization. This enables faceted search (filters) combined with vector nearest-neighbour — the same pre-filter / post-filter dynamic discussed in concepts/vector-similarity-search.

Memory was the cost¶

OpenSearch k-NN keeps vectors in memory for low-latency search, so cluster RAM scales with corpus × dimensionality. Figma named this as the second-biggest cost driver after frame-enumeration-and- thumbnailing, and deployed two mitigations:

Vector quantization (concepts/vector-quantization) — OpenSearch k-NN plugin supports quantization via knn-vector-quantization, compressing embeddings from 4-byte floats at a "small reduction in nearest neighbor search accuracy."
_source slimming. Vectors removed from _source so they aren't stored twice (once in k-NN graph, once in _source) and not returned on search responses. See patterns/source-field-slimming-with-external-refetch.

Two OpenSearch bugs candidly reported by Figma¶

Figma documented two kNN bugs hit at production scale:

Segment-replication replica non-determinism. Periodic non-determinism in end-to-end search tests. Root cause: replica queries returned different results than primary queries, tied to a Reader cannot be cast to class SegmentReader error in the delete path affecting replicas on clusters using segment replication. After joint investigation with the AWS OpenSearch team, fix shipped in upstream k-NN plugin as PR #1808.
_source update-path footgun. Because OpenSearch uses _source to diff-and-rewrite updated documents, removing the embedding from _source (for the storage optimisation) caused updates to unrelated fields (e.g. file name) to silently wipe the embedding off the re-indexed document. Figma's fix: re-fetch the embedding from DynamoDB on every update and include it in the update body, preserving the _source-slim optimisation on the read path. See patterns/source-field-slimming-with-external-refetch.

Both are good case studies of large customers surfacing class-of- issue bugs upstream rather than work-around-forever.

Performance tuning for traditional (full-text) search (Figma, 2026)¶

A sibling Figma post, "The Search for Speed in Figma" (2026-04-21), documents a months-long perf-debug on the non-AI OpenSearch search path (the substrate predating Figma AI Search). The highlights land on several OpenSearch-operational gotchas worth recording on this page:

Coordinator vs per-shard metrics¶

OpenSearch does not emit overall-query latency as a metric or log field — its reported "average latency" (e.g. in the DataDog native integration) is per-shard, between coordinator and worker nodes. The only overall-query latency is the took field in the query API response body. Figma's DataDog dashboard reported an 8 ms "average search" while their wrapped client saw 150 ms avg / 200–400 ms p99; the 120× gap was because up to ~500 per-shard queries fanned out per user query (canonical instance of concepts/metric-granularity-mismatch).

Operational rule: parse took from every response and publish it as your latency metric — do not trust vendor integrations' default "average query latency" for capacity planning.

Query phases¶

From OpenSearch's own docs:

Query phase — coordinator fans out one query per shard to worker nodes; many (not all) in parallel.
Fetch phase — coordinator collects per-shard results, picks winners, typically re-asks top shards for full documents.

The coordinator's cost grows with the number of shards even when each shard's work is trivially filtered down.

Shard sizing for latency-sensitive workloads¶

AWS's published sizing recommendations (shards <50 GB, ~1 shard/1.5 CPUs) are calibrated for throughput-intensive log workloads; Figma's measurements contradicted them for latency-sensitive document search with effective pre-filters. Cutting shards 450 → 180 (−60%) gave ≥50% max-QPS boost and decreased P50 latency (not just p99). Documented as patterns/fewer-larger-shards-for-latency.

Disk-cache residency is the latency floor¶

Two successive index-size reductions (50%, then additional 90% of unused-field data) had no measurable relevancy impact — the real win was making the live set fit in the OS disk cache. At that point all performance became consistent. Paired with a node-type swap to 1/3 CPU + 25% more RAM at ≈1/2 price — the CPU had been wasted, the RAM was the constraint. General concepts/cache-locality principle applied at the OS-page-cache layer for a search engine.

Benchmark harness¶

opensearch-benchmark (OpenSearch's own tool) was unsuitable — built for vendor-side regression testing, hard to send huge randomized query loads against an existing cluster, and strangely doesn't use the server-side took field so its latencies are client-contaminated. Figma wrote their own Go load generator in an afternoon → patterns/custom-benchmarking-harness.

Knobs that were ~neutral¶

Documented for posterity: zstd compression was a wash, concurrent segment search added latency even at low QPS and degraded faster under load on their shape. "There was no single magic bullet."

End-to-end impact¶

~60% API-latency reduction, ≥50% max-QPS headroom, >50% total cost reduction. (Source: sources/2026-04-21-figma-the-search-for-speed-in-figma)

Seen in¶

sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — positioned as the DRAM/SSD hot tier for vectors, destination of the S3 Vectors export flow; named as the right tier for "product recommendations or fraud detection."
sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma — full-production vector-search substrate for Figma AI Search over billions of entries; two-index hybrid (lexical fuzzy-match + k-NN); vector quantization + _source slimming; two candidly-reported bugs (segment-replication replica non-determinism fixed in k-NN PR #1808; _source update-path wipes embedding, fix via DynamoDB re-fetch).
sources/2026-04-21-figma-the-search-for-speed-in-figma — perf-tuning retrospective on Figma's traditional full-text OpenSearch path (post-migration from Elasticsearch, late 2023). Documents: the coordinator-vs-per-shard metric-granularity gotcha, took as the only overall-latency field, shard-count reduction 450→180, disk-cache-residency index trimming, cheaper RAM-heavy node mix, and a custom Go benchmark harness replacing opensearch-benchmark.
sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications — OpenSearch Serverless as hot-tier vector store for telemetry RAG. k-NN index of embedded logs / events / metrics streaming in via Fluent Bit → Kinesis → Lambda + Titan Embeddings v2. At query time the chatbot embeds the user's natural-language question, runs k-NN, and injects the matched telemetry snippets into the prompt. Serverless was picked explicitly to "avoid the overhead of managing infrastructure and can focus on the troubleshooting workflow itself." Canonical wiki reference for patterns/telemetry-to-rag-pipeline.
sources/2026-04-23-aws-modernizing-kyc-with-aws-serverless-solutions-and-agentic-ai — OpenSearch Serverless as the regulatory-corpus vector store in IBM + AWS's KYC architecture. Source documents (regulations, compliance rules, vendor docs) are indexed via Bedrock Knowledge Bases using Bedrock-generated embeddings; queries are embedded into the same vector space and retrieved via cosine-similarity k-NN. Distinct from the 2025-12-11 telemetry-RAG use case in that the retrieval is layered with context-aware retrieval: query-time metadata filters on jurisdiction / document-type / risk-level narrow the corpus slice before vector search runs.