Skip to content

PATTERN Cited by 1 source

Disable GraphQL in production

Problem

A GraphQL endpoint that accepts arbitrary, well-formed queries is by design open to any query shape a client cares to send. That openness is the ergonomic payoff of GraphQL at development time — clients can evolve queries without coordinating with the server. At production altitude the same openness becomes a liability:

  • The set of queries running against the graph is not knowable. Field-level usage, per-query p99, and "can we safely break this field?" are all approximate at best.
  • Arbitrary-query DoS (deep recursion, expensive resolvers) is an inherent attack surface.
  • Schema evolution lacks a proof obligation — "nothing breaks" has to be inferred from sampled telemetry.

Zalando's post names this framing and takes the counter-intuitive move: "to disable GraphQL in production." The production endpoint does not execute raw GraphQL. It executes only pre-registered queries referenced by ID (Source: sources/2022-02-16-zalando-graphql-persisted-queries-and-schema-stability).

Solution

Split the GraphQL surface into two regimes:

  • Build time. Developers write full GraphQL against a dev-mode endpoint. Codegen, batching, introspection, IDE support, schema exploration — all work normally.
  • Production time. The runtime endpoint accepts only {"id": "<hash>", "variables": {…}}. The query key is not accepted. Unknown IDs are rejected.

The bridge is a build-time persist step: when UI code is merged to main, the build pipeline extracts every query from the source, sends each to a persist endpoint, and receives a stable ID per query. The UI bundle ships with the IDs in place of the query text.

The result is that the production query set is the persisted-queries DB — a closed, finite, versioned catalogue rather than an open accept-anything surface.

  DEV ENDPOINT                       PROD ENDPOINT
  {                                  {
    query: "query { product {… } }"    id: "a1b2c3",
    variables: {…}                     variables: {…}
  }                                  }
   │                                   │
   │ accepts                           │ rejects unknown IDs
   │ arbitrary GraphQL                 │ executes only
   │ (for UI build & dev)              │ pre-registered queries
   ▼                                   ▼

Why this is not just bandwidth reduction

Automatic Persisted Queries has a well-known cache-mode variant (Apollo's default) where unknown hashes trigger a retry with full query text. That variant is a bandwidth optimisation — the endpoint is still open to arbitrary queries. The Zalando post is explicit that it "took a different approach": the endpoint refuses unknown hashes entirely (Source: sources/2022-02-16-zalando-graphql-persisted-queries-and-schema-stability).

The difference is contractual, not mechanical. Same persist step, same hashing, but the unknown-hash policy flips a bandwidth optimisation into a schema-stability regime.

What the closed query set buys

Three capabilities Zalando calls out, each decidable by static analysis of the persisted-queries DB:

  1. Per-query monitoring. Each query has a stable identity; SLOs are per-ID.
  2. Safe breaking changes. A schema change is safe precisely when no persisted query references the field being changed. This is a proof, not an estimate — see concepts/graphql-schema-usage-observability.
  3. Directive-based field lifecycle@draft blocks persisting queries that reference unstable fields, and @allowedFor restricts persisting to named components. These directives have teeth because the endpoint cannot bypass them with a raw query. See patterns/directive-based-field-lifecycle.

When this pattern fits

  • UI bundles are under your control. Web and mobile apps you ship can be forced through the persist step at build time.
  • Query shapes are statically enumerable. Clients do not assemble queries from dynamic runtime state in ways that are not predictable ahead of time.
  • Schema stability / breaking-change safety is a high- priority goal. Otherwise the added persist-step complexity buys less than cache-mode APQ.

When this pattern does not fit

  • Third-party integrations. You need a public-facing GraphQL endpoint that external developers call with novel queries — Shopify Storefront, GitHub GraphQL API, etc. There, the API is the value proposition.
  • Highly dynamic query shapes. Dashboards that compose queries based on user-selected filters at runtime.
  • No UI build pipeline. Web components or micro- frontends that are authored outside your build infrastructure and can't be pushed through the persist step.

In these cases, Apollo-style cache mode plus field-level usage telemetry and schema-review discipline is the practical ceiling.

Operational considerations

  • Persist endpoint is the governance choke point. Who can add entries? Unspecified in the Zalando post; non-trivial question in a shared-ownership monorepo with 150+ contributors.
  • Normalisation must agree between persist and execute. A one-character disagreement means production bundles cease to resolve.
  • Versioning / GC of the DB. When a bundle that referenced ID X is no longer deployed anywhere, is X kept, deprecated, deleted?
  • Emergency overrides. If a fix requires a new query and the persist pipeline is down, what is the escape hatch? Re-enabling raw-query execution temporarily? Unspecified.
  • Dev / staging / prod parity. The persist step must run identically in each environment, or staging traffic will test different queries than prod runs.

Seen in

Last updated · 501 distilled / 1,218 read