Skip to content

SYSTEM Cited by 1 source

voyage-3 / voyage-3-large

The voyage-3 family is Voyage AI's third-generation general-purpose embedding-model line, released in Voyage's 2025-01-07 Voyage-3-Large announcement and positioned as the default embedding tier for retrieval / search / recommendation workloads on Atlas post-acquisition. The family includes voyage-3, voyage-3-lite, voyage-3-xl, and voyage-3-large — the large variant targeted at retrieval-quality-at-scale use cases.

Properties relevant to system design

  • Serving regime — transformer-encoder inference, query-side workload is memory- bound on common GPUs (the workload batch Voyage AI's 2025-12-18 post builds infrastructure around).
  • Saturation point on NVIDIA A100: "~600 tokens". Batch composition around this point (via token-count-based batching on vLLM with padding removal) maximises MFU without adding latency.
  • Headline serving result — voyage-3-large query serving on the new token-count-batched + vLLM pipeline vs the old no-batching + HF Inference pipeline: 50 % GPU-inference-latency reduction, 3× fewer GPUs.
  • Query-latency SLO: typically 100–300 ms per request.

Nuance between voyage-3 and voyage-3-large in the 2025-12-18 post

Both names appear in the post and it's worth flagging the distinction:

  • The saturation-point profiling number ("~600 tokens") is attributed to "our voyage-3 model running on A100".
  • The headline 50 % / 3×-GPU result is from "a production experiment on the Voyage-3-Large model serving".

So the saturation-point elbow is stated for voyage-3; the headline outcome is for voyage-3-large. The post doesn't further distinguish the batch-size analysis per variant.

Seen in

Last updated · 200 distilled / 1,178 read