Skip to content

ZALANDO 2023-10-18

Read original ↗

Zalando — Understanding GraphQL Directives: Practical Use-Cases at Zalando

Summary

A 2023 deep-dive by Zalando's UBFF team (author: Boopathi Rajaa) walking through the full taxonomy of GraphQL directives in production at Europe's largest fashion e-commerce platform. The post catalogues the directives Zalando actually ships — schema-side for authorization, PII redaction, endpoint partitioning, enum governance, and entity-ID conventions; query-side for observability, error-taxonomy, and batch-shape control — and pairs each with its production motivation and implementation sketch. Companion piece to the 2022-02-16 persisted-queries post, extending the UBFF directive-based governance story beyond the three lifecycle directives (@draft / @allowedFor / @component) into the full directive footprint.

Key takeaways

  1. Directives split by location into two distinct classes. GraphQL's spec defines directives on two location kinds — TypeSystem locations (schema) and ExecutableDefinition locations (query). The built-in directives already respect this split: @deprecated and @specifiedBy are schema directives, @skip and @include are query directives. Schema directives declare server-side behaviour (authz, validation, resolution logic); query directives attach client-side metadata to the query (observability tags, error-taxonomy, batching shape) (concepts/graphql-schema-directive, concepts/graphql-query-directive).

  2. @isAuthenticated(acrValue: ACRValue) enforces field-level step-up authentication. Zalando uses SSO + ACR values (OpenID Connect's Authentication Context Class Reference) for customer auth. The UBFF mixes public product data with confidential customer data in a single graph; the directive lets the schema declare per-field whether a caller must be authenticated and at what ACR level. Optional acrValue argument: absent ⇒ any-authenticated-user; present ⇒ step-up required (e.g. HIGH for updateCustomerInfo).

  3. @sensitive(reason: String) drives PII redaction in logs and traces. Applied to argument definitions, the directive tells the observability layer "do not emit the runtime value of this variable into structured logs or tracing spans." Implementation: a GraphQL validation visitor walks the parsed document, finds variables whose corresponding arguments carry @sensitive, and returns the set of sensitive-variable names; the telemetry layer filters those before emission.

  4. A schema linter forces @sensitive adoption by keyword match. Relying on engineers to remember to add @sensitive to every sensitive argument is forgetful and manual; Zalando's schema linter fails the build when a field or argument name contains any of a bootstrapped keyword list — password, email, phone, bank, bic, account, owner, order, token, voucher, customer. The directive doesn't protect anything until it is applied; the linter turns the directive into an enforced policy.

  5. @requireExplicitEndpoint(endpoints: [String!]!) partitions the graph across multiple HTTP endpoints. A vanilla GraphQL deployment collapses all operations onto a single POST /graphql endpoint, defeating edge-level tooling built for REST (rate limiting, bot protection, caching, WAF rules that discriminate by path). The directive wraps the resolver, reads the request pathname, and rejects if the pathname is not in the directive's allowlist. Example: updateDeliveryAddress is only callable via POST /customer-addresses, not via the catch-all /graphql endpoint (patterns/directive-based-http-endpoint-partitioning).

  6. @final on ENUM_VALUE is a build-time-only directive that blocks safe-but-dangerous enum evolution. Adding a new value to a GraphQL enum is not a breaking change by the spec's compatibility rules, but it is a "dangerous change" — old mobile-app binaries in the wild may not have a handler for the new value and can misbehave. @final has zero runtime behaviour; it is enforced only by Zalando's GraphQL linter during builds, which rejects any PR that adds a value to a @final- marked enum. The mechanism makes dangerous changes harder on purpose: the engineer must first open a PR removing @final, justify the change in review, then open a second PR adding the value. Post's author notes "the ideal situation would be that all enums are treated as final by default, and this directive is never required in the first place" (concepts/final-enum).

  7. @extensibleEnum(values: [String!]!) on FIELD_DEFINITION is the inverse: the field is typed String in GraphQL but the directive declares an allowlist. Imports Zalando's RESTful API Guidelines x-extensible-enum convention into the GraphQL schema. The rationale is forward compatibility: clients are made aware from the type name that new values may appear; the value set can grow without being a GraphQL-level change. "We also noticed that contributors are more likely to use this directive for defining enums than the GraphQL native enum, as this directive is more explicit about the extensibility of the enum" (concepts/extensible-enum).

  8. @resolveEntityId drives TypeScript code-generation plus a runtime ID wrapper. Applied to Entity OBJECT types, it encodes Zalando's canonical ID convention — entity:<typename-lowercase>:<natural-id>, e.g. entity:product:1234 wrapping the SKU 1234. Build-time codegen emits the id resolver boilerplate and __typename resolvers; runtime code wraps returned IDs with the prefix. Optional override: String argument permits renaming (e.g. keeping a legacy ID prefix unchanged across a typename rename) (concepts/entity-id-convention, patterns/directive-driven-entity-codegen).

  9. Query-side: @tracingTag(value: String!) and @omitErrorTag are observability directives, not resolvers. @tracingTag attaches a client-defined tag to the OpenTelemetry span for the operation — clients filter traces by tag to find their own traffic. @omitErrorTag marks a field whose errors should not taint the span's overall error tag, so on-call rotations aren't woken by known-noisy fields. Implementation for query directives is different from schema directives: graphql-tools' resolver-wrapping mechanism doesn't apply. Instead, a two-step extractor walks the parsed AST — pre-execution to collect the field-paths carrying @omitErrorTag, post-execution to intersect with observed error paths.

  10. @maxCountInBatch(value: Int!) on QUERY is enforced at build/persist time, not runtime. The UBFF allows batching multiple queries per HTTP request; without caps, a malicious or buggy client could send unbounded batches. With persisted queries in production (the id-only API per patterns/disable-graphql-in-production), every query is known ahead of time; @maxCountInBatch is embedded by the client at persist time and honoured by the batcher.

Systems / concepts / patterns extracted

Operational numbers

The article is a directive taxonomy + mechanism walk-through; it does not carry QPS / latency / fleet-size numbers of its own. Production-scale context is inherited from the prior Zalando UBFF sources: 12+ domains, 200+ consuming developers, 25-30 feature teams, >80% Web / >50% App coverage (as of February 2021 — sources/2021-03-03-zalando-how-we-use-graphql-at-europes-largest-fashion-e-commerce-company).

Concrete directive-level disclosures:

  • @sensitive linter keyword list (verbatim, 11 keywords): password, email, phone, bank, bic, account, owner, order, token, voucher, customer.
  • Entity ID format: entity:product:1234 where product is the lowercased typename and 1234 is the SKU (natural ID).
  • ACR value scalar: scalar ACRValue @specifiedBy(url: "https://example.com/zalando-acr-value") — the URL is a placeholder in the post, but the pattern of @specifiedBy-ing a custom scalar for an OIDC-aligned auth-level namespace is concrete.
  • Example HIGH-ACR mutation: updateCustomerInfo(...): UpdateCustomerInfoResult @isAuthenticated(acrValue: HIGH).

Caveats

  • Zalando-specific conventions, not GraphQL-spec. Every directive in the post is a custom Zalando directive, not a built-in. Other organisations will use different names (and may already — Apollo's community has @auth, @rateLimit, etc.); the load-bearing insight is the taxonomy of directive use-cases, not the specific spelling.
  • @requireExplicitEndpoint depends on GraphQL-over-HTTP transport details. The implementation reads the request pathname from the resolver context; it would not work for non-HTTP transports (subscriptions over WebSocket, internal gRPC binding). The article does not discuss subscriptions.
  • Linter keyword list is a heuristic. It catches obviously-named fields (customerEmail) and misses oddly-named ones (contactInfo.primary); the directive must still be applied manually in the blind spots. Linter-based policy is a floor, not a proof.
  • @final is linter-only, not a runtime guard. An engineer who bypasses CI or cherry-picks can still add an enum value. The discipline survives because the linter runs in every PR, not because the enum-value set is cryptographically locked.
  • No query-directive runtime composability. The article flags explicitly that graphql-tools does not wrap query- directive behaviour the way it wraps schema-directive behaviour. The two-step AST-walk workaround for @omitErrorTag is the price paid; more invasive query- level behaviour (e.g. per-field rate limits) would be awkward with this approach.
  • Scoped to one Zalando team's view. The author notes "the use cases outlined in this blog post represent some of the ways we use GraphQL directives at Zalando. There are numerous other cases that we'll cover in future blog posts." This is not an exhaustive directive inventory.

Source

Last updated · 501 distilled / 1,218 read