PATTERN Cited by 1 source
Explainability log¶
Explainability log is the pattern of emitting a structured, per-decision record of everything a non-trivial algorithm did on a given invocation (candidates seen, choice made, rule results, state snapshots), stamping it with an ID, storing it cheaply, and attaching that ID to whatever customer-visible artifact the decision produced. When a customer or ops engineer asks "why did this go this way?", you load exactly one log and replay the reasoning.
Shape¶
Invocation begins ─► allocate log ID
│
For every decision step:
├─ record inputs, candidates, scoring/ranking result, chosen action, state diff
│
Invocation ends ─► package: actions[] + final state + metadata
│
▼
Async write to cheap KV / blob storage with TTL
│
▼
Attach log ID to the business artifact (order, trade, decision record, …)
Properties that make this work¶
- Per-decision granularity, not "here's the final output". The log records the path through the algorithm: which candidates existed at each step, which rules fired, what tie-breakers triggered. Knowing only the final route tells you what; the log tells you why.
- Structured, not free-text. Each step is an object with known fields so tooling can scan for patterns ("which orders had regret cycles?").
- Async write, off the hot path. The log shouldn't slow the decision. Canva stamps an ID, returns the order routing, and writes the log asynchronously to KV blob storage.
- TTL instead of retention forever. Logs are huge; bound cost with automatic expiry (Canva uses this explicitly). The explainability guarantee is "within the support window", not "forever".
- ID carried in the business artifact. Customer support opens the order, the order has the routing-log ID, they pull that log. No grep. No joins. No time windows.
What to actually log per step¶
- Current algorithm state (active path, cursor, active regrets, …).
- All candidates considered (not just chosen — the not-chosen ones are the interesting half of "why not Sydney?").
- Outputs of the ranker / scorer per candidate.
- Action taken (advance, backtrack, record regret, give up, …).
- Enough context to reconstruct the decision offline (e.g., the snapshot of the graph version used, or the graph-version ID).
Determinism multiplier¶
With patterns/deterministic-rule-ordering, the log becomes re-executable: feed the same inputs back through the decision engine and you must get the same outcome. This is how you A/B-test ranker changes — replay production logs against a candidate ranker and diff the outputs.
Differences from regular logs / traces¶
- Regular logs: free-text, cross-request, optimised for "what happened broadly?".
- Distributed traces: span-tree, latency-focused, optimised for "where was the time spent?".
- Explainability log: a single structured object per invocation of one specific decision algorithm, optimised for "reproduce this exact outcome offline". It's closer to a decision record than a log line.
Seen in¶
- sources/2024-12-10-canva-routing-print-orders — Canva's
routing log: per-iteration
actionobjects, packaged per traversal, stored async in KV blob storage with expiry, ID attached to the customer order; explicit goal of answering "why was this order sent there?"
Related¶
- patterns/deterministic-rule-ordering — determinism + log = replayable decisions
- patterns/full-stack-instrumentation — adjacent idea for per-IO per-layer telemetry
- systems/canva-print-routing