PATTERN Cited by 1 source
Consistency Checkers¶
SQL-based invariant tests that compare expected system state against data recorded across multiple sources of truth, run on a pre-defined cadence in both development and production. When an invariant is violated, the checker emits a structured alert with the exact rows + metadata needed to the team that owns the violated invariant. Positioned as stronger than E2E testing or User Acceptance Testing (UAT) for correctness of stateful workflows where failures can be subtle, asymmetric-cost, and late-binding (e.g. billing, access control, ledger systems).
Structure¶
- Invariants as SQL: each check is a targeted query whose empty result set encodes "correct." Non-empty → violation. Explainability is built in because the returned rows are the evidence.
- Unified cross-system data plane: the checks join data from multiple sources of truth (product event logs, business-state tables, payment processor state, CRM state, …) so the invariant can express "these four systems agree" — a statement unavailable to any one source of truth in isolation.
- Two canonical flavours:
- Data-quality checks — validate whether stored data accurately reflects the real product state. "The seat-lifecycle event log says user U was assigned a seat at T; the billing table has no matching seat for U."
- Code-logic checks — detect when application behaviour diverges from defined business rules. "A seat upgrade event occurred but the price adjustment never materialised on the invoice line item."
- Alert routing: violations emit structured notifications (row IDs, joined-context metadata, pointer to the invariant definition) to the team owning the involved subsystem. Contrast with generic dashboards — the checker doesn't just report a number, it names the misbehaving rows.
- Runs in dev and prod: same invariant code gates pre-deploy validation (catching regressions before rollout) and continuously observes production (catching drift, race conditions, edge cases that only manifest with real data volume).
Why it composes with E2E / UAT / unit tests¶
- Unit + integration tests cover code paths they name in advance; they can't assert about state combinations that arise only from production concurrency + data volume + historical data.
- E2E / UAT tests cover happy paths + a handful of designed edge cases; they're blind to emergent invariant violations.
- Consistency checkers express "this statement must be true over the full production dataset, whatever the code did to get there" — a complementary axis that the other two test categories don't cover.
Operating mechanics¶
- Authoring: checks live beside code, reviewed on PRs. Requires domain-specific translation of business rules into SQL — usually the costly prerequisite (cross-system schema knowledge, historical-data quirks, rule-exceptions for legacy contracts). Figma framed this as "building a deep understanding of how data moved through our billing systems, and where it could go wrong."
- Cadence: pre-defined (hourly / daily / per-deploy-verification). Not event-driven — runs against accumulated state, not individual transitions.
- Exposed-gap feedback: when a check fails because the data is missing the context needed to decide, the response is frequently to request new instrumentation upstream so future events capture the missing "why." This is the DS/platform feedback loop that compounds invariant coverage over time.
- Adoption path: typically starts in one high-correctness domain (Figma: Billing) and generalises to sibling domains that share the same correctness bar (Figma: product security, access/identity management, growth).
Wiki instances¶
- Figma Billing → platform-wide (canonical wiki instance) — introduced during the 2025 billing-model refresh ("a major re-architecture of our pricing, packaging, and billing mechanics"). Checks verify seat assignments, state transitions, invoice calculations; unify data from product logs + billing state
-
payment processing + CRM. Gave engineers "real-time observability into system behavior across both development and production environments" across the refresh → "uneventful rollout (the best kind)." Post-refresh, framework adopted across product security, access and identity management, and other growth teams; connected projects uses consistency checkers to confirm sharing and access settings behave as expected. (Source: sources/2026-04-21-figma-redefining-impact-as-a-data-scientist)
-
Slack (prior art, not yet ingested) — Figma's article explicitly links to Slack Engineering's data-consistency-checks post as the precedent: "other companies also leverage them as part of their testing strategy." Slack is a Tier-2 source in this wiki with no articles ingested yet; this post references the pattern and is a candidate for ingest if Slack's article surfaces in future feed polls.
Relationships¶
- Sibling to patterns/alert-backtesting (Airbnb) — both close the loop between "what we meant" and "what production is doing," but alert-backtesting validates alert-definition quality against historical time series, while consistency checkers validate state-invariant truth against current + historical cross-system data. Both incarnate the own-the-full-surface-area principle.
- Related to concepts/observability — a checker is observability that asserts, not just displays. Production dashboards show metrics; a consistency checker says "metric X should equal metric Y, here are the rows where it doesn't."
- Complementary to concepts/data-policy-separation (Figma's permissions DSL) — both separate "the rules" from "the data fetching," allowing the rules to be reasoned about, tested, and version-controlled separately. Consistency checkers are the continuous-runtime sibling of the CI static-analysis linter on permissions policies (patterns/policy-static-analysis-in-ci).
- Productization trajectory follows patterns/data-application-productization — a checker invariably starts as a bespoke debugging SQL and graduates to a framework once multiple teams depend on the same invariant shape.
Caveats¶
- Invariants require cross-system schema mastery — ownership of the check usually requires the author to understand every upstream system contributing data. At Figma this was DS + billing engineering collaborating; at organisations without a cross-domain DS team the author-cost is higher.
- Alert fatigue: if invariants over-fire (spurious violations from eventual-consistency lag, time-zone boundaries, in-flight transactions), teams stop reading the alerts. Requires careful timing-of-evaluation design — the checker needs to know what "eventually consistent" latency envelope to tolerate before declaring violation.
- Cost: running wide-join SQL across many source-of-truth tables on a fixed cadence adds a continuous warehouse-compute cost. Figma hasn't disclosed checker volume or cost.
- Not a substitute for E2E / UAT — these assert state properties, not user-journey correctness or UI integration. Figma is explicit that consistency checkers are an additional layer.
Undisclosed (Figma)¶
- Check count, firing rate, false-positive rate, cadence distribution, underlying orchestration (scheduler? warehouse?), alert-routing mechanism, schema-change breakage rate, coverage percentage of billing correctness surface. Article is an impact-framing post, not a systems-architecture post.