SYSTEM Cited by 1 source

torchtitan¶

torchtitan (github.com/pytorch/torchtitan) is PyTorch's reference implementation for scalable distributed training. It is one of the three OSS reference projects Netflix credits as informing the design of its internal Post-Training Framework (the others being systems/torchtune and systems/verl). First canonical wiki reference: sources/2026-02-13-netflix-scaling-llm-post-training-at-netflix.

Role¶

Canonical reference for PyTorch distributed training patterns — FSDP, tensor parallelism, 3D parallelism, activation checkpointing.
Cited as prior art for the pattern Netflix plans to adopt for its fallback HF-backend: "users will be able to run training directly on native transformers models for rapid exploration of novel architectures."

Relationship to Netflix's framework¶

Not a direct dependency — Netflix built its own Data/Model/Compute/Workflow surface — but design patterns from torchtitan informed scalable training recipe structure and distributed execution decisions.

torchtitan¶

Role¶

Relationship to Netflix's framework¶

Related¶