SYSTEM Cited by 3 sources
Llama 3.1¶
Llama 3.1 is Meta's 2024 open-weights foundation-model family (8B, 70B, 405B parameters). In the context of this wiki it is notable as the adaptation base for domain-adapted enterprise LLMs.
Seen in (wiki)¶
- eBay e-Llama. eBay takes Llama 3.1 8B + 70B as the base for continued pretraining on 1 trillion tokens of mixed e-commerce + general data — producing e-Llama. The continued-pretraining recipe uses a max LR set to 10% of the original Llama 3.1 max LR, a 1:1 general-to-domain sampling ratio, and includes replay data from curated/public/open-source corpora to resist catastrophic forgetting. Result: ~25% gain on English e-commerce benchmarks, ~30% on non-English, with ~1% general-domain regression for the 70B model. (Source: sources/2025-01-17-ebay-scaling-large-language-models-for-e-commerce-the-development)
- Instacart Intent Engine (SRL). Instacart fine-tunes Llama-3-8B (post uses shorthand "Llama3–8B" — canonical-identifier-wise this is Llama 3.1 8B) via LoRA on a curriculum dataset generated by an offline RAG "teacher" pipeline. Deployed with the LoRA adapter merged into the base weights, served on H100 at ~300 ms per query. Production SRL quality: 96.4% precision / 95.0% recall / 95.7% F1 — on-par F1 with the much larger frontier teacher model at ~2% of the serving footprint. Canonical wiki instance of LoRA-adapted Llama 3 serving as the real-time tail-query student in a patterns/head-cache-plus-tail-finetuned-model architecture. (Source: sources/2025-11-13-instacart-building-the-intent-engine)
Why "adapt" rather than "train from scratch"¶
From eBay's framing:
"Training a large-scale LLM from scratch is a very time- and resource-intensive process. In order to move fast, one could use existing pretrained models, such as Llama 3.1, for their use cases. However, these models typically lack specific knowledge, in our case about the e-commerce domain."
Llama 3.1's role in an enterprise adaptation pipeline is thus: known-capable open base → continued-pretrain with domain data + replay → fine-tune + RLHF → deploy. Trades the time cost of a from-scratch build against the ceiling cost of starting from a model you didn't shape. Instacart shows the lighter-touch adaptation path: LoRA fine-tune + adapter merge (no continued pretraining), retaining the stock base's general-language capacity and adding Instacart-specific slot-extraction behaviour via a small number of adapter parameters.
Related¶
- systems/llama-3 — the April-2024 Llama 3 (8B + 70B) predecessor family, trained on Meta's paired 24K-GPU H100 clusters (RoCE + InfiniBand). Llama 3.1 (8B + 70B + 405B, July 2024) is the successor update to Llama 3 on the same infra substrate.
- systems/e-llama — eBay's continued-pretrained derivative (8B + 70B).
- systems/instacart-intent-engine — Instacart's LoRA-fine-tuned-8B SRL student.
- concepts/continued-pretraining — the heavier adaptation technique (eBay).
- concepts/lora-low-rank-adaptation — the lighter adaptation technique (Instacart).
- concepts/adapter-merging — post-training merge that removes LoRA serving overhead.
- concepts/catastrophic-forgetting — the failure mode managed via replay-training when continued-pretraining off a base like Llama 3.1; LoRA structurally sidesteps it.
- patterns/continued-pretraining-for-domain-adaptation — the end-to-end continued-pretraining recipe.
- patterns/offline-teacher-online-student-distillation / patterns/head-cache-plus-tail-finetuned-model — deployment shapes that wrap a LoRA-adapted Llama 3 into a production serving architecture.
Hosted on Databricks FMAPI (2026-05-22)¶
Llama 3.1 8B is served on the Foundation Model APIs with implicit prompt caching enabled — both the stock 8B base and fine-tuned 8B served via PEFT serving. Caching survives the PEFT-adapter layer because the adapter is consistent across requests on the same endpoint, preserving prefix KV-cache reusability. Sibling Llama 3.3 70B is also covered by the same caching rollout. Part of the 2026-05-22 GA extension of Databricks' implicit-caching substrate to the open-weights model catalog (Source: sources/2026-05-22-databricks-accelerating-llm-inference-with-prompt-caching-for-open-source-models).