Skip to content

PATTERN Cited by 1 source

Cold-to-Hot Vector Tiering

Cold-to-hot vector tiering is the operational pattern of storing the full vector corpus in a cheap, storage-optimized index (the cold tier) and selectively promoting a subset — the current working / high-QPS set — into a DRAM/SSD real-time ANN index (the hot tier) on demand.

The canonical instance (AWS, 2025-07-16 preview launch): export an S3 Vectors index → OpenSearch Serverless k-NN collection via a console action.

(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)

Mechanics (AWS console flow)

  1. Vectors are ingested and stored in an S3 Vectors index. Storage cost is S3-tier. Queries run with subsecond latency on demand.
  2. When a subset of vectors becomes hot (e.g. the catalog active this season, recent fraud patterns, this month's users), pick Advanced search export → Export to OpenSearch on the vector index in the S3 console.
  3. Lands on the OpenSearch Service Integration console with the S3 vector source pre-selected and a service access role auto-suggested.
  4. Choose Export. A new OpenSearch Serverless collection is created and a k-NN index is populated with a copy of the vector data from S3.
  5. Monitor progress in the Import history pane. Once status = Complete, query the new OpenSearch k-NN index directly for hot workloads.

AWS's articulation:

"OpenSearch's high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection, while keeping less time-sensitive data in S3 Vectors."

When to apply it

Use this pattern when Don't bother when
Cost/GB of your vector store dominates spend Corpus is small enough to fit DRAM cheaply
Working set is a small fraction of the full corpus Entire corpus is queried at high QPS
Queries are a mix of archival "search my history" and real-time Latency-insensitive workload
You want to re-tier subsets over time (seasonal catalogs, etc.) Single workload, stable access pattern

Why it's a pattern, not just a feature

The shape — bulk-cold + hot-copy, with a managed export path — applies beyond AWS. Any vector DB that can ingest from S3 can participate in the cold side; any hot ANN engine (pgvector on Postgres, Pinecone, Weaviate, Qdrant, Elasticsearch) can be a destination. The AWS implementation happens to be a first-party one-click flow, but the architectural shape is portable.

Trade-offs

  • Data freshness — hot copy is a snapshot. Keeping it in sync with cold requires either periodic re-export or a change-data flow (not in the preview-launch scope).
  • Metadata drift — exported vectors bring metadata, but updates to metadata on the cold side don't auto-propagate.
  • Double storage cost for the hot-tier subset (acceptable if the subset is small).
  • Query-API fragmentation — cold queries use the s3vectors client; hot queries use the OpenSearch k-NN API. Application code has to pick the right tier per query.

Relation to other patterns

Seen in

Last updated · 200 distilled / 1,178 read