Skip to content

CONCEPT Cited by 1 source

State-transition logging

Definition

State-transition logging is the reliability property of emitting a log record for every transition of a deterministic state machine — not every operation, not every event, but specifically every (state, event) → next_state step. The trajectory of every managed resource through state space becomes reconstructible after the fact from the log alone.

Canonical wiki instance: Oxla's 2026-01-27 query- manager rewrite. Verbatim: "Every transition is logged. That means when something goes wrong, you can look at the logs and see exactly where the scheduler was and what it was doing. There's no ambiguity about whether a query is running, scheduled, canceled, or done. The system always knows, and you can see it." (Source: sources/2026-01-27-redpanda-engineering-den-query-manager-implementation-demo).

The debuggability payoff

Oxla's reported impact verbatim: "Bugs still happened, as they always do with new code, but they were much easier to track down. Being able to trace state transitions made fixes straightforward instead of exploratory." — issues "fixed in days instead of weeks".

The payoff is a categorical change in debugging workflow:

  • Without transition logging: debugging is exploratory — reproduce the bug, add instrumentation, re-run, observe, adjust. Requires the bug to be reproducible and the instrumentation to be on the right code path.
  • With transition logging: debugging is post-hoc reconstruction — pull the log for the failing resource, read the trajectory, identify where the unexpected transition happened. No reproduction needed; no additional instrumentation needed.

This mirrors the shift from "add more printfs" debugging to automated root cause analysis — the log is the ground truth, not the code.

What a transition log record contains

Minimum viable record per transition:

  • Resource ID — which query / request / workflow / connection is transitioning.
  • Old state — what state the resource was in.
  • Event — what event triggered the transition.
  • New state — what state the resource moved to.
  • Timestamp — for trajectory reconstruction + ordering.
  • Optional: actor (who drove the transition), causal context (upstream event IDs), side-effects (resources acquired / released).

Oxla's post doesn't disclose the exact record shape; the verbatim claim is that transitions are logged, which implies at least (resource_id, old_state, event, new_state, timestamp) to be useful.

Composition with state machine

State-transition logging presumes a deterministic state machine. You can't usefully log transitions of an implicit or emergent state machine — there's no enumerated (state, event) → next_state to record. The two properties go together: making the state machine explicit is a precondition for making transitions loggable.

Contrast: general-purpose logging

State-transition logging is narrower and more disciplined than general-purpose logging:

  • General logging records what the code did at various granularities (function entry/exit, error branches, metrics emission).
  • State-transition logging records what the state machine did at exactly one granularity: the transition.

The trade-off: state-transition logging gives you lossless trajectory reconstruction but doesn't tell you why a transition happened (what data the event carried, what branches the handler took). Both kinds of logging coexist in practice.

Relationship to audit trail

State-transition logging is a specialisation of audit trail where the "audited event" is specifically a state-machine transition. Audit trail is more general (any authoritative event — who did what when); transition logging is narrower (specifically the state-machine's own transitions).

For workloads where compliance / replay is required at the transition level — Oxla-in- ADP claims replayable agent interactions — transition logging is one of the substrates that makes the claim viable.

Seen in

Last updated · 470 distilled / 1,213 read