SYSTEM Cited by 1 source

OpenAI API¶

Definition¶

The OpenAI API (primarily the Chat Completions endpoint) is the hosted model-inference interface for the GPT family (GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini, o-series), accessed over HTTPS with JSON request/response payloads. A thin identity-style system that shows up across the wiki as the backend behind every GPT- system page. This page exists to carry the prompt-caching mechanics* specifically, and to disambiguate from the model pages (which carry per-model capabilities).

Wiki anchor — prompt caching¶

From OpenAI's documentation (cited by Zalando): requests whose prefix byte-matches a previously-seen request's prefix skip re-tokenisation and KV-prefill for the matching portion, reducing both latency and cost. Zalando's Component Migration Toolkit is the canonical wiki instance of a migration pipeline engineered around this: "we set up a structured prompt format that maximized cache hits. The prompt was organized to have the static part like transformation examples at top and the dynamic part (the file content) and the end, ensuring caching can be leveraged while transforming different files" (Source: sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries).

See concepts/prompt-cache-consistency for the design-pattern framing and concepts/static-dynamic-prompt-partitioning for the ordering primitive that makes caching tractable.

Output-token limits (canonical context)¶

Per-response output is bounded by the model's max_tokens parameter. At Zalando's 2024 timeframe on the OpenAI API, file transformations occasionally hit a 4K output-token ceiling that truncated the response mid-transformation. The recovery primitive — sending a "continue" prompt on the same conversation — only works because the API supports stateless request chaining with a client-side conversation wrapper (see systems/python-llm-library).

Seen in¶

sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical and only wiki source. Zalando pointed at it through the llm library.

systems/gpt-4 · systems/gpt-4o · systems/gpt-4o-mini — models served by this API
systems/python-llm-library — one wrapper over this API
concepts/prompt-cache-consistency — design pattern for prompt-cache hits
concepts/static-dynamic-prompt-partitioning — concrete ordering primitive

OpenAI API¶

Definition¶

Wiki anchor — prompt caching¶

Output-token limits (canonical context)¶

Seen in¶

Related¶