CONCEPT Cited by 1 source
Supervised fine-tuning¶
Definition¶
Supervised fine-tuning (SFT) is the phase of LLM adaptation in which an already-pretrained base model is trained on labelled (input, expected-output) pairs to teach it specific behaviours — follow instructions, produce ranked lists, respond in a particular format, use domain vocabulary correctly. Unlike pretraining (next-token prediction on unstructured text) and continued pretraining (more next-token prediction on a different corpus), SFT's signal is task-shaped: the loss is computed against the answer the humans or upstream system labelled as correct.
SFT sits between two adjacent techniques in the LLM-adaptation toolbox:
- Continued pretraining (CPT) — autoregressive LM loss on a new corpus; teaches the model new facts and vocabulary.
- RLHF / preference optimisation (DPO, ORPO, etc.) — trains on preference pairs or reward signals; refines tone, helpfulness, safety.
Many production adaptations sequence all three: base → CPT → SFT → RLHF.
Canonical wiki reference¶
Meta's web-monorepo RCA system (2024-06; sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response) uses a two-stage SFT pipeline:
- Mixed SFT — Llama 2's original SFT data + internal Meta context + a dedicated RCA SFT dataset of ~5,000 instruction-tuning examples. Each example contains 2-20 candidate changes + the known root cause + information available at investigation start. Teaches the model to follow RCA instructions.
- Logprob-ranking SFT — a second SFT round on examples where the expected output is "a list of potential code changes likely responsible for the issue ordered by their logprobs-ranked relevance, with the expected root cause at the start." Teaches the model to emit natively-ranked lists whose token log-probabilities can be read out as calibrated ranking scores.
Three design choices SFT forces¶
- Dataset construction. SFT is only as good as the labels. Meta's RCA SFT set was curated from historical investigations with known root causes — a naturally-occurring label source. For tasks without ground truth, label budgets drive cost.
- Example format. The input format used at training time is the format used at inference time. Meta's RCA inference uses the same (title, observed impact, candidates) format the SFT was trained on; deviation degrades quality.
- Data mixing. Meta mixes Llama 2's original SFT data + internal context + RCA SFT dataset in the first round. Pure domain-SFT risks catastrophic forgetting of the base model's instruction-following; too much original-SFT data dilutes the domain signal. The mixing ratio is a load-bearing hyperparameter.
SFT vs the alternatives¶
- vs CPT alone. CPT teaches knowledge; SFT teaches format + task. A model that knows Meta's code base but hasn't seen RCA instructions won't rank changes on demand.
- vs RAG at inference time. RAG scales the context pool; SFT changes the default behaviour. Meta's RCA system uses both — the heuristic retriever provides candidate context at inference time, the SFT-trained ranker is the reasoner.
- vs LoRA / adapter-based SFT. LoRA trains a small low-rank adapter on top of a frozen base. Cheaper to train + swap; less capacity than full-weight SFT. Meta does not disclose whether the RCA ranker used full-weight SFT or LoRA.
- vs knowledge distillation. Distillation trains a student against a teacher's outputs (soft labels); SFT trains against hard ground-truth labels. Many LLM-serving deployments combine both (teacher-distilled soft-label SFT).
Caveats¶
- Backtesting ≠ production. Meta reports 42% top-5 accuracy on backtested historical investigations. Live SFT'd rankers drift as the distribution of changes shifts; refresh cadence matters but is not disclosed.
- Small SFT sets go far. 5,000 examples is modest by pretraining standards but substantial for a task-specific SFT, and Meta's numbers show it's enough to shift Llama 2 (7B) to 42% top-5 on a real task. General rule: SFT on quality labels > SFT on volume.
- Mixing ratio is under-reported. The original-SFT:internal:RCA ratio is not disclosed; this is the single biggest knob for forgetting vs specialisation.
- Format rigidity. Models SFT'd on (title, impact, candidate-list) may degrade if the prompt format changes at inference time. Format is a first-class artefact.
Seen in¶
- sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response — two-stage SFT pipeline on Llama 2 (7B) for RCA ranking.
Related¶
- concepts/continued-pretraining — the preceding adaptation stage; teaches knowledge.
- concepts/lora-low-rank-adaptation — the adapter-based SFT variant.
- concepts/knowledge-distillation — the soft-label sibling.
- concepts/llm-based-ranker — the architectural role Meta's SFT produces.
- systems/llama-2 — the base model.
- systems/meta-rca-system — canonical instance.
- companies/meta