PATTERN Cited by 1 source
Cold-to-Hot Vector Tiering¶
Cold-to-hot vector tiering is the operational pattern of storing the full vector corpus in a cheap, storage-optimized index (the cold tier) and selectively promoting a subset — the current working / high-QPS set — into a DRAM/SSD real-time ANN index (the hot tier) on demand.
The canonical instance (AWS, 2025-07-16 preview launch): export an S3 Vectors index → OpenSearch Serverless k-NN collection via a console action.
(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)
Mechanics (AWS console flow)¶
- Vectors are ingested and stored in an S3 Vectors index. Storage cost is S3-tier. Queries run with subsecond latency on demand.
- When a subset of vectors becomes hot (e.g. the catalog active this season, recent fraud patterns, this month's users), pick Advanced search export → Export to OpenSearch on the vector index in the S3 console.
- Lands on the OpenSearch Service Integration console with the S3 vector source pre-selected and a service access role auto-suggested.
- Choose Export. A new OpenSearch Serverless collection is created and a k-NN index is populated with a copy of the vector data from S3.
- Monitor progress in the Import history pane. Once status = Complete, query the new OpenSearch k-NN index directly for hot workloads.
AWS's articulation:
"OpenSearch's high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection, while keeping less time-sensitive data in S3 Vectors."
When to apply it¶
| Use this pattern when | Don't bother when |
|---|---|
| Cost/GB of your vector store dominates spend | Corpus is small enough to fit DRAM cheaply |
| Working set is a small fraction of the full corpus | Entire corpus is queried at high QPS |
| Queries are a mix of archival "search my history" and real-time | Latency-insensitive workload |
| You want to re-tier subsets over time (seasonal catalogs, etc.) | Single workload, stable access pattern |
Why it's a pattern, not just a feature¶
The shape — bulk-cold + hot-copy, with a managed export path — applies beyond AWS. Any vector DB that can ingest from S3 can participate in the cold side; any hot ANN engine (pgvector on Postgres, Pinecone, Weaviate, Qdrant, Elasticsearch) can be a destination. The AWS implementation happens to be a first-party one-click flow, but the architectural shape is portable.
Trade-offs¶
- Data freshness — hot copy is a snapshot. Keeping it in sync with cold requires either periodic re-export or a change-data flow (not in the preview-launch scope).
- Metadata drift — exported vectors bring metadata, but updates to metadata on the cold side don't auto-propagate.
- Double storage cost for the hot-tier subset (acceptable if the subset is small).
- Query-API fragmentation — cold queries use the
s3vectorsclient; hot queries use the OpenSearch k-NN API. Application code has to pick the right tier per query.
Relation to other patterns¶
- patterns/presentation-layer-over-storage — broader pattern of layering workload-specific presentations over a common durable store; cold-to-hot vector tiering is an instance (S3 as durable cold, OpenSearch as a derived hot presentation).
- concepts/compute-storage-separation — generalized principle this pattern instantiates for vector indices.
- concepts/hybrid-vector-tiering — the conceptual split this pattern operationalises.
Seen in¶
- sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — launches S3 Vectors with a first-party console flow to export to OpenSearch Serverless k-NN, naming "product recommendations or fraud detection" as the hot use case and "long-term vector data" as the cold.