PATTERN Cited by 1 source
Directive-based PII redaction¶
Problem¶
Observability pipelines (structured logs, distributed
traces) typically capture request variables verbatim for
reproducibility — which means a GraphQL mutation with an
email: String argument will, by default, leak the email
value into every tracing span and log line for that
request. Because observability data flows to a wider
audience than the application data (SREs, support, long-
term retention, cross-region log storage), that leak has a
larger blast radius than the underlying database. PII
filtering done at the observability sink requires trust
in every downstream consumer; filtering at source avoids
shipping the sensitive value at all.
The challenge: engineers will not reliably remember to
filter every sensitive argument, and a pure-naming-
convention approach (prefix variable names with sensitive_)
conflates marking with the transport value itself.
Pattern¶
Two parts working together:
- Mark with a schema directive on
ARGUMENT_DEFINITION. Zalando uses:
directive @sensitive(
"An optional reason why the field is marked as sensitive"
reason: String
) on ARGUMENT_DEFINITION
type Mutation {
updateCustomerInfo(
email: String @sensitive(reason: "Customer email address")
phoneNumber: String @sensitive(reason: "Customer phone number")
): UpdateCustomerInfoResult
}
The directive does not redact — it marks the argument
as sensitive. Runtime validation walks the parsed
operation document and returns the set of variable names
whose arguments carry @sensitive; the telemetry layer
filters them before emission.
- Enforce with a
schema linter.
The directive is forgettable on its own; Zalando's
linter fails the build when a field or argument name
matches a bootstrapped keyword list:
password,email,phone,bank,bic,account,owner,order,token,voucher,customer. If a developer adds acustomerEmailargument without@sensitive, CI rejects the PR.
(Source: sources/2023-10-18-zalando-understanding-graphql-directives-practical-use-cases-at-zalando).
Implementation sketch¶
A GraphQL validator visitor, run before the request hits the telemetry layer:
function getSensitiveVariables(schema, document) {
const sensitiveVariables = [];
require("graphql").validate(schema, document, [
(context) => ({
Variable(node) {
const isSensitive = context
.getArgument()
?.astNode?.directives?.some(
(d) => d.name.value === "sensitive"
);
if (isSensitive) sensitiveVariables.push(node.name.value);
},
}),
]);
return sensitiveVariables;
}
The returned variable-name set plugs into the logging middleware (redact matching keys in structured logs) and the tracing middleware (drop matching keys from span attributes).
Why the directive + linter pair is load-bearing¶
- Directive alone is a marker, not a gate. A sensitive
argument without
@sensitivecontinues to leak until someone notices. - Linter alone is a blunt instrument. A keyword match
on field names hits false positives (
contactEmailused as an internal ID, not a PII value) and misses false-negative-shaped names (contactInfo.primary). - Together they form a policy gate. The linter creates the forcing function that makes the directive universal; the directive carries the precise machine-readable mark that the observability layer acts on.
Trade-offs¶
- Argument-scope, not field-scope. Zalando's directive
is on
ARGUMENT_DEFINITION; it targets client-supplied variables. Response-side field redaction (e.g.Customer.emailreturned in a query response) needs a separate mechanism — schema-directive onFIELD_DEFINITIONwith response-wrapping, or trace/log sampling at lower rates for specific response classes. - Linter keyword list is bootstrap-then-tune. New PII
blind spots (
residencyDocument,ibanSecondary) will not match the initial list; the list must evolve with the schema. Treat it as a living policy, not a frozen rule set. - Does not replace data-classification cataloguing. A formal PII taxonomy (per-field data-sensitivity annotations fed from a central master-data system) is a strictly stronger mechanism for large organisations; directive-based redaction is the lightweight complement.
Seen in¶
- Zalando UBFF — canonical instance.
@sensitive(reason: String) on ARGUMENT_DEFINITION+ 11-keyword linter (password, email, phone, bank, bic, account, owner, order, token, voucher, customer) (sources/2023-10-18-zalando-understanding-graphql-directives-practical-use-cases-at-zalando, systems/zalando-graphql-ubff).
Related¶
- concepts/sensitive-field-logging-redaction — the broader concept the directive implements.
- concepts/schema-linter-enforcement — the enforcement mechanism that makes the directive universal.
- concepts/graphql-schema-directive — the primitive.
- systems/opentelemetry — the tracing layer that consumes the filtered variable set.
- patterns/directive-based-field-authorization — sibling schema-directive pattern.
- systems/graphql · systems/zalando-graphql-ubff