SYSTEM Cited by 4 sources
GraphQL¶
Definition¶
GraphQL is a query language and runtime for APIs developed by Facebook (now Meta), open-sourced in 2015. Clients declaratively specify the shape of the response they want; the server responds with exactly that shape. A single GraphQL endpoint can replace many per-resource REST endpoints, and a single query can traverse multiple domains.
Design principles (from the spec)¶
The June 2018 GraphQL specification calls out two principles that shape how it gets deployed in production:
Hierarchical: GraphQL specification recommends the language to be structured in hierarchy to be well suited for Hierarchical Views in modern frontend applications.
Product-centric: The evolution of a GraphQL schema is directly influenced by the product/business features being developed by frontend engineers.
Both are quoted verbatim in Zalando's 2021 post on why the UBFF architecture works at their scale (Source: sources/2021-03-03-zalando-how-we-use-graphql-at-europes-largest-fashion-e-commerce-company).
Query shape¶
Canonical example from Zalando's post — fetching a product name:
The client names the field name; only that field comes
back. To add a field, the client adds it to the query — the
server doesn't change.
Why organisations adopt it¶
- Client-driven field selection eliminates over-fetching. Mobile clients get exactly what they need, reducing egress and render time.
- Single evolution timeline. Adding a feature means adding fields + resolvers; no version-bumping of client binaries.
- Schema-level deprecation tracking. Field-level usage telemetry lets you deprecate dead fields with confidence.
- Federation (Apollo Federation, Netflix DGS, etc.) lets multiple services own their slice of one schema.
Architectural patterns built on GraphQL¶
- GraphQL as unified API platform — one gateway in front of many backends; Twitter (1.5B fields/sec) and Netflix (1B daily requests) as canonical disclosures.
- Unified GraphQL BFF — one service, one schema, replacing per-surface BFFs. Zalando's UBFF is the canonical instance.
- Federated subgraph per domain — multiple team-owned subgraphs composed at a router. Yelp CHAOS + Apollo Federation is the canonical wiki instance.
Trade-offs at scale¶
- Resolvers fan out expensively. A naive GraphQL query can trigger many backend calls; protect with DataLoader- style batching + depth/complexity limits.
- Caching is harder than REST. HTTP caching doesn't map cleanly because every query is a unique URL + body. Use a GraphQL-aware cache or a persisted-query layer.
- Reference executor can be a bottleneck. Zalando built
graphql-jit — a JIT-compiled
executor — because the reference
graphql-jsinterpreter was insufficient at their scale. - Single-service gateways are single points of failure. Federation distributes ownership at the cost of runtime composition complexity.
Error modeling¶
GraphQL's response envelope is deliberately schema-free:
{
"data": { /* schema-typed */ },
"errors": [{ "message": "...", "path": [...], "extensions": {...} }]
}
This creates a schema discoverability gap — the schema doesn't describe the shape of errors. Two mechanisms close the gap at different layers:
error.extensions— an open-ended metadata channel on each entry ofresponse.errors. The widely-adopted convention isextensions.code(e.g.NOT_FOUND,NOT_AUTHORIZED) so clients switch on a stable code rather than parsingmessage.- Schema-level Problem
types — named after
RFC 7807 to avoid
colliding with GraphQL's reserved
error, typically appearing on a [[patterns/result-union-type-for-mutation-outcome|union ...Result = Success | Problem]] returned from a mutation.
Zalando's 2021 error-modeling post codifies when to use
each: classify by who can act on the failure (the
action-taker
classification). Customer-actionable → schema Problem
type; Developer- or operator-actionable → response.errors
with a code. The two axes Zalando names for making the
split are
error propagation
semantics (schema Problems don't propagate; Errors do)
and bug-vs-domain-state encapsulation (bugs must stay out
of the schema).
See concepts/problem-vs-error-distinction for the naming discipline.
Notable implementations on the wiki¶
- systems/zalando-graphql-ubff — Zalando's single-service unified GraphQL, 12+ domains
- systems/apollo-federation — Apollo's federation runtime
- systems/netflix-enterprise-graphql-gateway — Netflix's one-graph in the Studio ecosystem
- systems/strawberry-graphql — Python library; Yelp uses it for federated Python subgraphs
- systems/github-graphql-api — GitHub's single GraphQL API
- systems/fly-graphql-api — Fly.io's provisioning API
Seen in¶
- sources/2021-03-03-zalando-how-we-use-graphql-at-europes-largest-fashion-e-commerce-company — Zalando's Unified BFF GraphQL architecture; the design- principle quotes are from here.
- sources/2021-04-12-zalando-modeling-errors-in-graphql — Zalando's GraphQL error-modeling discipline: Problem vs Error, action-taker classification, RFC 7807 naming.
- sources/2022-12-02-highscalability-stuff-the-internet-says-on-scalability-for-december-2nd-2022 — Twitter + Netflix production scale disclosures.
- sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui — Yelp CHAOS as a federated subgraph.
Related¶
- systems/graphql-jit — Zalando's JIT executor
- systems/apollo-federation
- systems/strawberry-graphql
- systems/netflix-enterprise-graphql-gateway
- systems/zalando-graphql-ubff
- systems/rfc-7807-problem-details
- patterns/graphql-unified-api-platform
- patterns/unified-graphql-backend-for-frontend
- patterns/federated-graphql-subgraph-per-domain
- patterns/result-union-type-for-mutation-outcome
- patterns/problem-type-for-customer-actionable-errors
- patterns/error-extensions-code-for-developer-actionable-errors
- concepts/graphql-error-extensions
- concepts/graphql-error-propagation
- concepts/problem-vs-error-distinction
- concepts/schema-discoverability-gap-in-errors