Skip to content

CONCEPT Cited by 1 source

Sensitive field logging redaction

Definition

Sensitive field logging redaction is the discipline of ensuring that designated fields or arguments — typically PII, credentials, or commercially-confidential values — are never written verbatim into observability surfaces (structured logs, tracing spans, error objects, crash dumps). The redaction happens at the observability layer's emission boundary, not at the application data layer: the application may still read, store, and process the sensitive value normally, but the layer that emits logs or traces must filter it out before those artifacts are shipped to centralised systems.

The mechanism is distinct from data-at-rest encryption (which covers the sensitive value on disk), transport encryption (TLS, which covers it on the wire between systems), and access control (which covers who may read the value through the application). Logging redaction addresses a specific operational failure mode: observability pipelines often flow to wider audiences (SREs, support, data scientists, incident responders, long-term retention for compliance) than the application itself, so a sensitive value that leaks into a log line has leaked into a much larger trust boundary.

The Zalando GraphQL implementation

Zalando's UBFF marks sensitive argument definitions in the schema with a custom directive:

directive @sensitive(
  "An optional reason why the field is marked as sensitive"
  reason: String
) on ARGUMENT_DEFINITION

Applied to mutation arguments:

type Mutation {
  updateCustomerInfo(
    email: String @sensitive(reason: "Customer email address")
    phoneNumber: String @sensitive(reason: "Customer phone number")
  ): UpdateCustomerInfoResult
}

At observability-emission time, a GraphQL validator walks the parsed document and returns the set of variable names corresponding to @sensitive-annotated arguments; the logging/tracing layer then filters those variables out before emission. Implementation sketch (shortened from the post):

function getSensitiveVariables(schema, document) {
  const sensitiveVariables = [];
  require("graphql").validate(schema, document, [
    (context) => ({
      Variable(node) {
        const isSensitive = context
          .getArgument()
          ?.astNode?.directives?.some(
            (directive) => directive.name.value === "sensitive"
          );
        if (isSensitive) {
          sensitiveVariables.push(node.name.value);
        }
      },
    }),
  ]);
  return sensitiveVariables;
}

The directive itself does not redact — it only marks. Redaction is performed by the telemetry middleware that inspects the marks.

Why the directive alone is insufficient — enforce via linter

The directive must be applied manually by the schema author; forgetting it on a new sensitive argument leaks the value. Zalando pairs the directive with a schema linter that fails the build when a field or argument name matches a bootstrapped keyword list: password, email, phone, bank, bic, account, owner, order, token, voucher, customer (verbatim from the Zalando post — Source: sources/2023-10-18-zalando-understanding-graphql-directives-practical-use-cases-at-zalando).

A field named customerEmail without @sensitive fails lint; the engineer must either add the directive, justify a lint-exception, or rename the field. This turns the directive from a marker into an enforced policy.

Why this class of mechanism exists

  • Logs flow wider than application data. A log-retention cluster, a trace-sampling backend, or a crash-dump store is typically readable by many more people than the underlying database. A PII value leaked to logs has a larger blast radius than the same value in the DB.
  • Logs outlive sessions. Trace spans are retained for weeks to months; an email address in a span lives past the session that produced it.
  • Logs cross jurisdictions. Observability pipelines often flow to other regions for cost reasons. GDPR, CCPA, and similar regimes can be violated by log egress even when application-layer residency is correct.
  • Redaction-at-source is cheaper than redaction-at-sink. Filtering before emission avoids shipping sensitive data at all; filtering at the observability backend requires trust in downstream operators and tooling.

The argument-definition vs. field-definition choice

Zalando places @sensitive on ARGUMENT_DEFINITION, not FIELD_DEFINITION. The argument-side is where client- supplied values enter — variable bindings that observability might otherwise log verbatim as part of request- reproducibility. Response-side field values are redaction-worthy too but are typically handled by separate mechanisms (response sampling, trace sampling, trace-data- class filters); the @sensitive directive in this post is specifically about variable redaction in traces/logs and is therefore argument-definition-scoped.

Other organisations may choose to extend a sensitive-style directive to FIELD_DEFINITION, adding response-side redaction logic in the resolver wrapping step.

Seen in

Last updated · 501 distilled / 1,218 read