Skip to content

SYSTEM Cited by 4 sources

Voyage AI

Voyage AI is MongoDB's embedding-and-reranking model line, originally founded by Stanford's Tengyu Ma (and team) and acquired by MongoDB in 2025 to form the native embedding- generation and reranking layer for Atlas Vector Search.

Products

  • Embedding models — the voyage-3 family (systems/voyage-3-large, voyage-3, voyage-3-lite, voyage-3-xl, domain-specialised voyage- law / voyage-finance / voyage-code / voyage-multilingual) served as a hosted embedding API on the Voyage platform and integrated into MongoDB Atlas.
  • Reranking models — cross-encoder rerankers (rerank-2, rerank- 2-lite and successors) designed to sit on top of hybrid-search first-stage retrieval.
  • Query-vs-document-aware serving — explicit distinction between query embeddings (short, latency-sensitive, 100–300 ms SLO) and document embeddings (long, batch-ingested) drives per-class serving optimisations.

Properties relevant to system design

  • Embedding-inference serving stack — the 2025-12-18 engineering blog post documents the production stack for the query side: vLLM with padding removal as the inference engine, Redis + Lua atomic script as the batch-claim queue (patterns/atomic-conditional-batch-claim), token-count-based batching to the model-and-hardware-specific saturation point (~600 tokens for voyage-3 on A100).
  • Non-durable queue + 503 fallback"the probability of Redis losing data is very low. In the rare case that it does happen, users may receive 503 Service Unavailable errors and can simply retry." Clients must be idempotent.
  • Gradual model onboarding — 7+ models migrated off the legacy Hugging Face Inference
  • no-batching pipeline onto vLLM + token-count batching over time.

Reported production numbers

From the 2025-12-18 post, for the query side of embedding serving:

  • 50 % GPU-inference-latency reduction (voyage-3-large vs old pipeline).
  • 3× fewer GPUs for the same workload.
  • Across 7+ models onboarded:
    • Up to ~20 ms GPU-inference-time drop via vLLM + padding removal.
    • Up to 8× throughput improvement via token-count batching.
    • P90 end-to-end latency drops by 60+ ms on some model servers under contention.
    • P90 more stable during traffic spikes, even with fewer GPUs.

Disclaimer: numbers reflect Voyage AI's specific new-vs-old pipelines; "not necessarily generalisable".

Integration with MongoDB Atlas

Following the 2025 acquisition, Voyage AI embeddings + rerankers are pitched as the native embedding + reranking layer for Atlas Vector Search + Atlas Hybrid Search. The 2025-09-30 hybrid-search post names "cross-encoders, learning-to-rank, and dynamic scoring profiles" as the emerging re-ranking layer above hybrid retrieval — implicit Voyage AI integration direction. The 2025-09-25 From Niche NoSQL to Enterprise Powerhouse post describes Voyage AI as "embedding- generation-as-a-service" inside MongoDB's unified developer experience.

Seen in

Last updated · 200 distilled / 1,178 read