SYSTEM Cited by 1 source
OpenAI API¶
Definition¶
The OpenAI API (primarily the Chat Completions endpoint) is the hosted model-inference interface for the GPT family (GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini, o-series), accessed over HTTPS with JSON request/response payloads. A thin identity-style system that shows up across the wiki as the backend behind every GPT- system page. This page exists to carry the prompt-caching mechanics* specifically, and to disambiguate from the model pages (which carry per-model capabilities).
Wiki anchor — prompt caching¶
From OpenAI's documentation (cited by Zalando): requests whose prefix byte-matches a previously-seen request's prefix skip re-tokenisation and KV-prefill for the matching portion, reducing both latency and cost. Zalando's Component Migration Toolkit is the canonical wiki instance of a migration pipeline engineered around this: "we set up a structured prompt format that maximized cache hits. The prompt was organized to have the static part like transformation examples at top and the dynamic part (the file content) and the end, ensuring caching can be leveraged while transforming different files" (Source: sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries).
See concepts/prompt-cache-consistency for the design-pattern framing and concepts/static-dynamic-prompt-partitioning for the ordering primitive that makes caching tractable.
Output-token limits (canonical context)¶
Per-response output is bounded by the model's
max_tokens parameter. At Zalando's 2024 timeframe on the
OpenAI API, file transformations occasionally hit a 4K
output-token ceiling that truncated the response
mid-transformation. The recovery primitive —
sending a
"continue" prompt on the same conversation — only works
because the API supports stateless request chaining with a
client-side conversation wrapper (see
systems/python-llm-library).
Seen in¶
- sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries
— canonical and only wiki source. Zalando pointed at it
through the
llmlibrary.
Related¶
- systems/gpt-4 · systems/gpt-4o · systems/gpt-4o-mini — models served by this API
- systems/python-llm-library — one wrapper over this API
- concepts/prompt-cache-consistency — design pattern for prompt-cache hits
- concepts/static-dynamic-prompt-partitioning — concrete ordering primitive