CONCEPT Cited by 1 source
Hybrid Vector Tiering (Cold S3 ↔ Hot OpenSearch)¶
Hybrid vector tiering is the storage-and-query pattern that recognises vector workloads have bimodal access profiles — a large slow-growing archival set where storage cost dominates, and a small high-QPS working set where query latency dominates — and places each in its cost-appropriate tier, with a cheap migration path between them.
(Source: sources/2025-07-16-aws-amazon-s3-vectors-preview-launch)
Why the split exists¶
Different vector workloads exert different pressure on the storage system:
| Workload | What matters | Right tier |
|---|---|---|
| Semantic search over archive (historical media, slow-growing RAG corpora, agent memory) | Storage cost per vector | S3-tier (e.g. systems/s3-vectors) |
| Real-time recommendations, fraud detection | QPS + p99 latency | DRAM/SSD-tier (e.g. OpenSearch Serverless k-NN) |
AWS's 2025-07-16 launch post gives the canonical articulation:
"You can balance cost and performance by adopting a tiered strategy that stores long-term vector data cost-effectively in Amazon S3 while exporting high priority vectors to OpenSearch for real-time query performance. ... OpenSearch's high performance (high QPS, low latency) for critical, real-time applications, such as product recommendations or fraud detection, while keeping less time-sensitive data in S3 Vectors."
Structure of the tiering¶
┌─────────────────────┐
│ Ingest (Bedrock │
│ embedding models) │
└──────────┬──────────┘
▼
┌──────────────────────────┐
│ S3 Vectors (cold) │
│ storage-optimized │
│ tens-of-M to billions │
│ subsecond query │
└──────────┬───────────────┘
│ "Advanced search export →
│ Export to OpenSearch"
▼
┌──────────────────────────┐
│ OpenSearch Serverless │
│ k-NN (hot) │
│ low-latency real-time │
│ DRAM/SSD-backed │
└──────────────────────────┘
The cold tier is the durability + capacity home for vectors. The hot tier is a selective, derived view copied from the cold tier for workloads that need real-time performance.
Cost asymmetry motivates the pattern¶
Warfield (2026-04-07) names the storage-economics argument explicitly:
"Customers were finding that, especially over text-based data like code or PDFs, that the vectors themselves were often more bytes than the data being indexed, stored on media many times more expensive."
For a static/slow-growing corpus with low QPS, running DRAM/SSD vector clusters pays compute + memory rent for storage you don't need at hot latency. Shipping those vectors to cold storage recovers that budget; the hot tier is then sized only for the working set.
(Source: sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3)
Contrast with single-tier approaches¶
- DRAM-only vector DB (historical Pinecone / Weaviate posture): best latency, worst cost/GB. Forces all vectors — active or not — into expensive storage.
- Disk-based ANN (pgvector, recent DiskANN variants): cheaper, latency dependent on SSD seek patterns; still runs on provisioned compute clusters.
- Storage-first ANN (S3 Vectors): cheapest bulk storage, elastic, "subsecond" but not microsecond; no provisioned cluster.
Hybrid tiering doesn't pick a winner — it uses the right tier per access pattern within a single application.
Related patterns¶
- patterns/cold-to-hot-vector-tiering — the operational pattern (export selected index → managed hot store).
- concepts/compute-storage-separation — the broader architectural principle hybrid vector tiering instantiates for vector indices.
Seen in¶
- sources/2025-07-16-aws-amazon-s3-vectors-preview-launch — launches S3 Vectors + export-to-OpenSearch flow as an integrated tiering product, naming "product recommendations or fraud detection" as the hot use cases and "long-term vector data" as the cold.
- sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3 — contributes the cost-asymmetry framing that motivates the split.