PATTERN Cited by 1 source
Thin library on top of OSS compute platform¶
Intent¶
Deliver ML-platform capabilities by building a thin, opinionated library on top of an off-the-shelf stack of open-source components (PyTorch + Ray + vLLM + Verl) sitting on a generic internal compute substrate — rather than building a bespoke internal-only ML platform that reinvents orchestration, storage, or serving. Concentrate engineering investment on differential-value surfaces (workload-specific performance, business-requirement integration) rather than platform plumbing.
First canonical wiki reference: sources/2026-02-13-netflix-scaling-llm-post-training-at-netflix — the explicit design philosophy Netflix articulates for its Post-Training Framework.
Problem¶
ML-platform teams face a recurring decision: how much of the stack to own end-to-end.
- Own everything: bespoke cluster manager, bespoke job scheduler, bespoke training library, bespoke inference stack, bespoke tokenizer. Maximum control; maximum headcount; drifts from the OSS community's progress.
- Own nothing: adopt a vendor platform (SageMaker, Vertex AI, bespoke IaaS). Minimum control; no differential value possible.
Neither extreme is right for a team whose value-add is specialised adaptation (e.g. Netflix's member-interaction-sequence training workloads) that needs both framework-level optimisations and community-velocity ingestion of new models.
Solution¶
A three-layer stack:
┌──────────────────────────────────────────────────┐
│ Domain library (this team's value-add) │
│ - Data/Model/Compute/Workflow abstractions │
│ - Internal performance optimisations │
│ - Business-requirement integration │
├──────────────────────────────────────────────────┤
│ OSS compute stack (unmodified or lightly wrapped) │
│ - PyTorch + Ray + vLLM + Verl + HuggingFace │
├──────────────────────────────────────────────────┤
│ Internal compute substrate (generic) │
│ - GPU provisioning │
│ - AWS / DC networking │
└──────────────────────────────────────────────────┘
Netflix's framing:
"At the base is Mako, Netflix's internal ML compute platform, which provisions GPUs on AWS. On top of Mako, we run robust open-source components — PyTorch, Ray, and vLLM — largely out of the box. Our post-training framework sits above these foundations as a library: it provides reusable utilities and standardized training recipes for common workflows such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Reinforcement Learning (RL), and Knowledge Distillation. Users typically express jobs as configuration files that select a recipe and plug in task-specific components." (Source: sources/2026-02-13-netflix-scaling-llm-post-training-at-netflix)
Differential value surfaces¶
The library concentrates engineering on surfaces where off-the-shelf is weakest — explicitly stated by Netflix:
"A post-training framework is only worth owning if it delivers clear value beyond assembling OSS components. We build on open source for velocity, but we invest heavily where off-the-shelf tools tend to be weakest: performance tuned to our workload characteristics, and integration with Netflix-specific model and business requirements."
Concrete examples from the source:
- Performance wins tuned to workload: async sequence packing (up to 4.7× throughput on the most skewed dataset), vocab padding to kernel boundaries (avoids 3× LM-head slowdown).
- Non-standard transformer support: member-interaction-sequence models, custom output projection heads, bespoke RL loops integrated with custom inference engines.
- Consistent cross-cutting abstractions: MFU accurate under custom architectures + LoRA; uniform LoRA extensibility across families.
Delegation pattern: use OSS for commodity layers¶
Where the OSS community has converged on good-enough abstractions, use them unchanged:
- Ray for distributed workflow orchestration / actor lifecycle.
- PyTorch for model definition and distributed collectives.
- vLLM for inference (and as the tokenizer contract).
- Verl for RL-specific distributed orchestration layering on top of Ray.
- Hugging Face AutoTokenizer as the single source of truth for tokenization.
- Hugging Face checkpoint format for interchange (patterns/huggingface-checkpoint-compat-for-internal-optimized-model).
Integrate rather than rewrite. When Verl's abstractions fit the RL-orchestration problem, use them; don't invent Netflix's own.
Applicability¶
- ✅ Internal platform teams with differentiated workloads requiring framework-level performance tuning.
- ✅ Teams whose value-add is in workload-specific optimisation or business-requirement integration, not in generic orchestration.
- ✅ Ecosystems where OSS has converged on production-grade abstractions (Ray, PyTorch, vLLM) and is moving faster than any internal-only equivalent.
- ❌ Teams whose value-add IS the orchestration layer (cloud vendors building ML PaaS products).
- ❌ Use cases so thin there's no differential value to extract beyond the OSS baseline — use the OSS directly.
Trade-offs¶
| Benefit | Cost |
|---|---|
| Engineering concentrated on differential-value surface | Depends on OSS API stability |
| Move with community velocity on new models/features | Must keep up with OSS version churn |
| Library users get both OSS-ecosystem portability and internal optimisations | Users have to understand both the library API and (sometimes) the OSS layer beneath |
| Small framework team can maintain it | Bug fixes may require upstream OSS contributions |
Known uses¶
- Netflix Post-Training Framework (2026-02) — canonical instance. Thin library over PyTorch + Ray + vLLM + Verl, sitting on Mako compute.
- Foundational platform + domain libraries is the related structural pattern at the platform level — here the library is the unit of ownership; Mako is the platform below, and PyTorch/Ray/vLLM/Verl are the between.