SYSTEM Cited by 2 sources

GPT-OSS¶

GPT-OSS is OpenAI's family of open-weight LLMs released in 2025, including Mixture-of-Experts variants. First wiki mention: sources/2026-02-13-netflix-scaling-llm-post-training-at-netflix — cited alongside Qwen3, Qwen3 MoE, and Gemma3 as one of the "modern architectures + Mixture-of-Experts variants" supported by Netflix's internal Post-Training Framework.

Hosted on Databricks FMAPI (2026-05-22)¶

GPT-OSS 20B and 120B are served on the Foundation Model APIs with implicit prompt caching enabled. The 2026-05-22 Databricks announcement names GPT-OSS as the first OSS-model rollout of the prompt-caching capability, with the disclosed numbers from one of Databricks' large-scale production batch-inference pipelines:

+2.5× per-replica input-token throughput
3× P50 latency reduction
30% cache hit ratio (described as "relatively low")

The 30% hit-ratio / 2.5× throughput asymmetry is structurally explained by prefill-skip economics: cache hits completely skip the prefill stage, so even modest hit rates yield large per-hit savings on prefill-dominated workloads (Source: sources/2026-05-22-databricks-accelerating-llm-inference-with-prompt-caching-for-open-source-models).

GPT-OSS¶

Hosted on Databricks FMAPI (2026-05-22)¶

Related¶