Skip to content

SYSTEM Cited by 1 source

Figma Response Sampling

What it is

Figma's in-house security-detection system for sensitive data exposure in API responses. A configurable fraction of outbound responses from Figma's Ruby application server is asynchronously inspected for: (Phase 1) file identifiers that the requesting user should not have access to, and (Phase 2) any field tagged as banned_from_clients by FigTag. Runs in both staging and production as an observability layer on top of PermissionsV2 — the detection complement to prevention.

Architecture

Enforcement point: Ruby after filter + async jobs

  • Implemented as middleware in the Ruby application server, using a built-in after block that runs after every request completes — a consistent place to inspect responses before they ship to the client.
  • Sampling is uniform-random across request paths at a configurable rate, tuned to balance coverage against overhead.
  • Non-blocking: if sampling or verification fails, the request still completes normally; errors are logged for monitoring.
  • Verification is executed in async jobs — the after filter extracts candidates synchronously, enqueues the check, returns.
  • Rate limiting on the processing pipeline prevents resource exhaustion under surge.

Why middleware in the app server (not an Envoy proxy)

The app-server layer gives middleware direct access to:

  1. The authenticated user object — needed to evaluate permissions.
  2. The full API response body — needed to scan for sensitive identifiers/values.
  3. The internal permissions engine (PermissionsV2).

Doing this in Envoy or another proxy would require reconstructing user context and would make user-aware permission checks "significantly harder" — the three capabilities above exist together only at the application tier.

Phase 1 — Permission Auditor (file identifiers)

The bootstrapping implementation. File identifiers are the ideal starter data type because:

  • Sensitivity and access rules are already well-defined in PermissionsV2.
  • They are "high-entropy capability tokens with a known character set and consistent length" — trivial to detect in JSON bodies.

Flow:

  1. after filter parses the JSON response body.
  2. Extracts any strings matching the file-identifier shape.
  3. Enqueues an async job per identifier to re-verify user × identifier access via PermissionsV2.
  4. False-positive-suppression logic accounts for known safe cases (e.g., identifiers that are legitimately visible in a given endpoint's contract).
  5. Unexpected findings land in the analytics warehouse + triage dashboards.

Findings surfaced within days: file identifiers returned in responses unnecessarily (triggered better data filtering), paths where files bypassed permission checks entirely (gaps closed).

Phase 2 — Sensitive Data Analyzer ("fancy Response Sampling")

Generalizes the same pipeline to any column tagged banned_from_clients by FigTag. Rather than scanning the response body for a known pattern, Figma tracks which sensitive values were loaded during the request:

  1. FigTag annotates every DB column with a sensitivity category; annotations propagate to the data warehouse and are queryable at request time.
  2. An ActiveRecord callback fires whenever a record with a banned_from_clients column loads — on sampled requests, it records the loaded value into request-local storage. This avoids global overhead for unsampled requests.
  3. After response generation, the after filter inspects the serialized JSON and compares it against the recorded sensitive values.
  4. If any sensitive value appears in the response, a finding is logged; results flow through the same unified warehouse + dashboards as Phase 1.

Why the callback+request-local trick: alternative approaches (post-hoc DB log scraping, response-body regex) can't tell whether a matched string was actually the sensitive database value or a coincidence, and can't scope per-request. The callback pins down exactly which sensitive values this request touched, so the inspection is precise without a static schema.

Cross-service integration (LiveGraph)

LiveGraph, Figma's real-time data-fetching service, submits sampled responses to an internal endpoint that funnels into the same processing pipeline. Keeps performance predictable:

  • Sampling in LiveGraph gated by configuration + rate limiting.
  • After LiveGraph produces a response, a lightweight API call hands the sampled data off; LiveGraph's real-time data flow is unaffected.
  • Findings share the same schema and logging path — on-call engineers interpret alerts uniformly across sources.

Allowlisting (dynamic)

A flexible allowlisting process excludes endpoints with intentional, safe exposure (e.g., an OAuth client secret returned by a dedicated credential-management endpoint to an authorized user). Same value appearing in an unrelated response = critical finding. Config-driven (no redeploy), per-endpoint / per-field — this is what keeps the FP rate low enough for engineers to trust the alerts (patterns/dynamic-allowlist-for-safe-exposure).

Deployment posture

  • Staging + production concurrently — two lines of defense: early detection before release + regression monitoring in prod.
  • Asynchronous everywhere on the verification path so the user doesn't pay latency.
  • Rate-limited pipeline to bound infrastructure cost.

Impact (disclosed in post)

  • Caught long-unused data fields leaking into certain responses → targeted fix.
  • Surfaced cases where related-resource data was included without a clear need → clean-up work.
  • Highlighted responses returning a list of resources without verifying access for each item → stronger per-item permission checks.
  • Closed authorization paths that bypassed permission checks entirely for file access.

Limits (gaps in the post)

  • No disclosed sampling rate, QPS, or latency-overhead numbers.
  • No disclosed false-positive or true-positive rate.
  • No detail on async-job substrate (worker pool, retry, DLQ).
  • Phase 2 is currently scoped to columns traversed by ActiveRecord — non-ORM data paths would need a parallel hook.
  • Future work named in the post: finer-grained sampling controls, automated triage, richer trend reporting, broader PII + regulated data coverage, extension to non-API interaction channels.

Seen in

Last updated · 200 distilled / 1,178 read