Skip to content

PATTERN

Actor-tagged query observability

Pattern: capture the authenticated actor (user / service account / internal tool / API key / tenant) on every query emitted by an authenticated request, using a SQLCommenter-style comment attached to the SQL. The tag propagates through the database driver, the connection pool, and the database-observability layer unchanged — making every query stat and every error occurrence attributable to a single principal without cross-system log joins.

Shape

/* actor='alice@example.com',controller='DatabaseBranchPasswords#create' */
INSERT INTO database_branch_password (...) VALUES (...);

The SQL comment is a no-op at the database execution layer (MySQL / Postgres both strip or ignore leading comments), but:

  • The database-observability layer parses the comment and attaches the tags to the query's captured metadata.
  • Every error raised by the query inherits the tag — including errors raised by the storage engine (unique-violation, deadlock, statement-timeout) that would not otherwise see the application's authentication state.
  • The tag survives connection pooling and query rewriting because it travels on the query string itself, not in a separate channel.

Why this matters

The classic debugging problem for intermittent production errors is correlation across layers: the error is raised in the database at a point where the caller's identity has been stripped by several request hops. Application-layer error handlers could capture the identity themselves, but they see only the subset of errors that propagate back through the call stack — not the errors swallowed by retry logic, not the errors raised by library code, not the query-level statistics the database-observability layer collects.

Capturing the actor tag on the query — before it leaves the application — means the database-side observability tool has the caller identity on every query stat and every error event, by construction. The debugging workflow becomes: error spike → filter by tag → identify the caller → fix the code. No log joins, no request-ID tracing, no application-layer instrumentation per error class.

Canonical production instance

PlanetScale Query Insights tags authenticated queries with an actor tag (and others — controller, action, request-ID-equivalent). In the Rafer Hazen disclosure (Source: ), a production error class — AlreadyExists on the database_branch_password unique index — was debugged entirely through the actor tag:

  1. "Occurrences of these errors come in small batches with nearly identical timestamps" — the timestamps on each error occurrence (already on the error record) showed sub-second clustering, hypothesising a concurrency bug.
  2. "The actor tag for each batch of errors is always the same" — the tag revealed that one caller — an internal tool — was responsible for every cluster.
  3. "It was easy to look them up and determine that an internal tool was issuing multiple password create requests in parallel" — tag-to-identity lookup was trivial; the actor tag is the caller's stable identifier.
  4. "Our solution was simply to modify the script to avoid that behavior" — a one-line fix in the calling code, reached in minutes, without any database-side or application-side instrumentation.

The pattern reduces the debugging workflow from "raise a request-ID-logging ticket, wait for the next error occurrence, ship a new version that logs more, pray you catch it" to "filter the error page by actor."

Tag-capture-threshold economics

Capturing every tag on every query at scale costs storage. PlanetScale's Insights applies a tail-capture threshold: "a query pattern must have had at least one query that took more than 1 second, read more than 10k rows, or resulted in an error" for per-occurrence tag storage. Cheap-fast- successful queries are pattern-aggregated without tag granularity; the interesting tail (slow / heavy / failing) is captured with tags intact.

This is a deliberate cost/value trade-off: the queries an engineer is most likely to filter by actor are the ones that failed, were slow, or were heavy — which are the ones guaranteed to carry tags. Normal queries don't need per-occurrence attribution for the debugging path.

Implementation contract

For the pattern to work end-to-end, three layers must cooperate:

  1. Application layer (framework / middleware): captures the current authenticated principal once per request (current_user, authenticated_service, request.api_key.owner, etc.) and injects it into every query issued during that request, via a SQLCommenter-style hook or equivalent.
  2. Database driver / ORM: preserves the query comment unchanged when rewriting, caching, or batching queries. Most major drivers (ActiveRecord via the marginalia gem, Django via django- sqlcommenter, Rails-8-native SQLCommenter support, psycopg3 hooks) support this natively.
  3. Database-observability layer: parses the SQLCommenter comment and exposes the tags as searchable fields on query stats and error occurrences. The filter syntax — see concepts/query-tag-filter — is what makes the captured data actionable.

If any layer drops the tag, the pattern degrades gracefully (errors still carry other information — timestamp, SQL, rollup counts — but lose the cross-caller attribution that makes a spike debuggable in minutes).

Generalises beyond actor

The actor tag is the most universally populated because authentication state is always available at the app layer. But the same pattern supports:

  • feature:<name> — per-feature impact analysis, especially for new rollouts.
  • tenant:<id> — per-tenant noisy-neighbour diagnosis.
  • deploy:<version> — per-deploy regression hunt.
  • controller:<action> — per-endpoint cost attribution.
  • job:<job_id> — per-background-job attribution for worker-pool tracing.

All are the same pattern with a different key captured at the app layer. The filter syntax (tag:key:value) is shared; the value of the pattern scales with the diversity of keys the app layer can feed in.

Seen in

  • earliest wiki instance of the pattern in PlanetScale's own production-debugging voice (2022-06-29, Coutermarsh + Ekechukwu) — three months before the canonical actor-tag disclosure on 2022-09-27. The tag carried is job='ScheduleSnapshotJob' (per-background-job attribution) rather than actor='<user>' (per-authenticated-principal), but the structural shape is identical: tag the query at the ORM layer via Rails 7's query_log_tags + the PlanetScale-authored activerecord-sql_commenter gem for SQLCommenter format, see the tag on the Insights slow-query surface, localise the offending code without additional instrumentation. Canonical quote: "Using Insights and tags on the slow query, we were able to find exactly where this query was coming from. This enabled us to quickly find and fix the issue in our application." Narrower tag (job=...) than the later actor=... disclosure but same pattern shape; generalises the pattern beyond the authentication axis to any per-request key the ORM middleware can capture.

  • — Canonical disclosure. PlanetScale's own production debugging of a check-then-act race condition on DatabaseBranchPassword.create hinged on the actor tag being present on every error occurrence; the whole debug session stayed in the Insights UI until the final code edit.

  • — Substantial extension. The pre-existing disclosure canonicalised actor-tagged observability on the ** notable-query tail + error stream; this release extends it to the aggregate query stream (every query pattern, every 15s, subject to cardinality collapse bounds). Operational consequence: the actor tag now answers aggregate-attribution questions ("what percentage of runtime did Alice's queries consume?") in addition to per-occurrence debugging. The tag:actor:X filter (concepts/query-tag-filter) now operates across both streams. Scale concern: actor is typically low-cardinality per query pattern (most patterns run from a few callers) so it rarely trips per-pattern cardinality collapse — but a fleet with many distinct actors calling the same hot pattern can trip collapse, in which case actor-level attribution is lost until the window resets.
Last updated · 542 distilled / 1,571 read