Skip to content

CONCEPT Cited by 1 source

GraphQL schema usage observability

Definition

GraphQL schema usage observability is the property — only cheaply available once the production query set is closed and enumerable — that for every field in the schema, one can answer the question "which queries, if any, reference this field in production?"

The property does not hold for a permissive GraphQL endpoint that accepts arbitrary queries — such an endpoint would require field-level resolver instrumentation and aggregation over sampled traffic to approximate the answer, and the answer would be a lower bound (unused fields cannot be confirmed unused without a watch period).

It holds exactly and for free when the production endpoint accepts only persisted query IDs: the set of queries the endpoint can execute is the persisted-queries DB, and static analysis of that finite DB answers the question deterministically (Source: sources/2022-02-16-zalando-graphql-persisted-queries-and-schema-stability).

Why this falls out of gate-mode persisted queries

Zalando names the property directly: "we know at any time what parts of the schema are used in production and what are not used in production." The logical chain:

  1. Production requests carry only id, not query.
  2. IDs are validated against the persisted-queries DB.
  3. Therefore the DB is the set of queries that can execute in production.
  4. Static analysis of the DB against the schema computes, for every field, the list of query IDs that reference it (possibly empty).

No field-level resolver instrumentation is required, and the answer is exact rather than sampled (Source: sources/2022-02-16-zalando-graphql-persisted-queries-and-schema-stability).

What you do with it

Zalando calls out three capabilities:

  1. Per-query monitoring and alerting. "We leverage these persisted queries for better monitoring and alerting for each individual query separately." Instead of aggregate endpoint SLOs, each persisted query has a stable identity, so latency p99, error rate, resolver cost breakdown, and rollback signals can all be per-query.
  2. Safe breaking changes. "We are also able to tell if certain fields can have a breaking change because the field is no longer used or never used in production." Removing a field, changing its type, or renaming it is safe precisely when no persisted query references it — and that is decidable.
  3. Dead-field GC. Conversely, fields referenced by zero persisted queries are candidates for deletion, which reduces schema surface, resolver code, and cognitive load.

The asymmetry with REST and raw GraphQL

  • REST. Endpoint-level usage is observable from access logs, but field-level / response-property-level usage is not: the server returns a response; what the client does with it is hidden. Field-level dead-code detection requires parsing client code or instrumenting client responses.
  • Raw GraphQL (permissive endpoint). Field-level usage is observable via resolver instrumentation but is approximate (sampled, watch-window-dependent), and "no one is using this field" is only ever a lower bound — an uncommon query issued once a month would be missed.
  • Persisted-queries-only GraphQL (gate mode). Field usage is determined by static analysis of a closed set. The answer is exact.

This is why Zalando describes the disable-raw-GraphQL move as a schema stability enabler — the stability comes from being able to prove fields are unused.

Preconditions

This observability requires gate-mode persisted queries (unknown hashes rejected), not cache-mode (Apollo APQ's default, which falls back to query-text if the hash is unknown). Under cache mode, an arbitrary client can still issue a novel query — so the production query set is not closed, and the observability property does not hold.

See patterns/disable-graphql-in-production for the framing.

Seen in

Last updated · 501 distilled / 1,218 read