Skip to content

SYSTEM Cited by 1 source

OpenAI API

Definition

The OpenAI API (primarily the Chat Completions endpoint) is the hosted model-inference interface for the GPT family (GPT-4, GPT-4 Turbo, GPT-4o, GPT-4o-mini, o-series), accessed over HTTPS with JSON request/response payloads. A thin identity-style system that shows up across the wiki as the backend behind every GPT- system page. This page exists to carry the prompt-caching mechanics* specifically, and to disambiguate from the model pages (which carry per-model capabilities).

Wiki anchor — prompt caching

From OpenAI's documentation (cited by Zalando): requests whose prefix byte-matches a previously-seen request's prefix skip re-tokenisation and KV-prefill for the matching portion, reducing both latency and cost. Zalando's Component Migration Toolkit is the canonical wiki instance of a migration pipeline engineered around this: "we set up a structured prompt format that maximized cache hits. The prompt was organized to have the static part like transformation examples at top and the dynamic part (the file content) and the end, ensuring caching can be leveraged while transforming different files" (Source: sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries).

See concepts/prompt-cache-consistency for the design-pattern framing and concepts/static-dynamic-prompt-partitioning for the ordering primitive that makes caching tractable.

Output-token limits (canonical context)

Per-response output is bounded by the model's max_tokens parameter. At Zalando's 2024 timeframe on the OpenAI API, file transformations occasionally hit a 4K output-token ceiling that truncated the response mid-transformation. The recovery primitive — sending a "continue" prompt on the same conversation — only works because the API supports stateless request chaining with a client-side conversation wrapper (see systems/python-llm-library).

Seen in

Last updated · 501 distilled / 1,218 read