SYSTEM Cited by 1 source
torchtitan¶
torchtitan (github.com/pytorch/torchtitan) is PyTorch's reference implementation for scalable distributed training. It is one of the three OSS reference projects Netflix credits as informing the design of its internal Post-Training Framework (the others being systems/torchtune and systems/verl). First canonical wiki reference: sources/2026-02-13-netflix-scaling-llm-post-training-at-netflix.
Role¶
- Canonical reference for PyTorch distributed training patterns โ FSDP, tensor parallelism, 3D parallelism, activation checkpointing.
- Cited as prior art for the pattern Netflix plans to adopt for its fallback HF-backend: "users will be able to run training directly on native
transformersmodels for rapid exploration of novel architectures."
Relationship to Netflix's framework¶
Not a direct dependency โ Netflix built its own Data/Model/Compute/Workflow surface โ but design patterns from torchtitan informed scalable training recipe structure and distributed execution decisions.