Skip to content

PATTERN Cited by 1 source

Directive-based PII redaction

Problem

Observability pipelines (structured logs, distributed traces) typically capture request variables verbatim for reproducibility — which means a GraphQL mutation with an email: String argument will, by default, leak the email value into every tracing span and log line for that request. Because observability data flows to a wider audience than the application data (SREs, support, long- term retention, cross-region log storage), that leak has a larger blast radius than the underlying database. PII filtering done at the observability sink requires trust in every downstream consumer; filtering at source avoids shipping the sensitive value at all.

The challenge: engineers will not reliably remember to filter every sensitive argument, and a pure-naming- convention approach (prefix variable names with sensitive_) conflates marking with the transport value itself.

Pattern

Two parts working together:

  1. Mark with a schema directive on ARGUMENT_DEFINITION. Zalando uses:
directive @sensitive(
  "An optional reason why the field is marked as sensitive"
  reason: String
) on ARGUMENT_DEFINITION

type Mutation {
  updateCustomerInfo(
    email: String @sensitive(reason: "Customer email address")
    phoneNumber: String @sensitive(reason: "Customer phone number")
  ): UpdateCustomerInfoResult
}

The directive does not redact — it marks the argument as sensitive. Runtime validation walks the parsed operation document and returns the set of variable names whose arguments carry @sensitive; the telemetry layer filters them before emission.

  1. Enforce with a schema linter. The directive is forgettable on its own; Zalando's linter fails the build when a field or argument name matches a bootstrapped keyword list: password, email, phone, bank, bic, account, owner, order, token, voucher, customer. If a developer adds a customerEmail argument without @sensitive, CI rejects the PR.

(Source: sources/2023-10-18-zalando-understanding-graphql-directives-practical-use-cases-at-zalando).

Implementation sketch

A GraphQL validator visitor, run before the request hits the telemetry layer:

function getSensitiveVariables(schema, document) {
  const sensitiveVariables = [];
  require("graphql").validate(schema, document, [
    (context) => ({
      Variable(node) {
        const isSensitive = context
          .getArgument()
          ?.astNode?.directives?.some(
            (d) => d.name.value === "sensitive"
          );
        if (isSensitive) sensitiveVariables.push(node.name.value);
      },
    }),
  ]);
  return sensitiveVariables;
}

The returned variable-name set plugs into the logging middleware (redact matching keys in structured logs) and the tracing middleware (drop matching keys from span attributes).

Why the directive + linter pair is load-bearing

  • Directive alone is a marker, not a gate. A sensitive argument without @sensitive continues to leak until someone notices.
  • Linter alone is a blunt instrument. A keyword match on field names hits false positives (contactEmail used as an internal ID, not a PII value) and misses false-negative-shaped names (contactInfo.primary).
  • Together they form a policy gate. The linter creates the forcing function that makes the directive universal; the directive carries the precise machine-readable mark that the observability layer acts on.

Trade-offs

  • Argument-scope, not field-scope. Zalando's directive is on ARGUMENT_DEFINITION; it targets client-supplied variables. Response-side field redaction (e.g. Customer.email returned in a query response) needs a separate mechanism — schema-directive on FIELD_DEFINITION with response-wrapping, or trace/log sampling at lower rates for specific response classes.
  • Linter keyword list is bootstrap-then-tune. New PII blind spots (residencyDocument, ibanSecondary) will not match the initial list; the list must evolve with the schema. Treat it as a living policy, not a frozen rule set.
  • Does not replace data-classification cataloguing. A formal PII taxonomy (per-field data-sensitivity annotations fed from a central master-data system) is a strictly stronger mechanism for large organisations; directive-based redaction is the lightweight complement.

Seen in

Last updated · 501 distilled / 1,218 read