PATTERN Cited by 1 source
Runtime change detection¶
Runtime change detection is the pattern of continuously observing production state and diffing it against an authoritative declared state, alerting on any delta. It is the operational answer to undocumented production changes — changes that bypass the change-management gate and that traditional CAB-style governance (patterns/cab-approval-gate) is structurally blind to.
The framing¶
From the 2023-08-16 Swedbank / change-controls post:
"We can think of software changes as streams, feeding into our environments which are lakes. Change management puts a gate in the stream to control what flows into the lake, but doesn't monitor the lake. If it is possible to make a change to production without detection, then change management only protects one source of risk. The only way to be sure you don't have undocumented production changes is with runtime monitoring."
The pattern inverts the traditional change-management posture: rather than only asking "was this change approved before it was made?", it asks "does the current production state match what we think we deployed?" — and surfaces the difference.
The canonical shape¶
- Declared state — a version-controlled, authoritative record of what production should look like. Examples: the Git repo in a GitOps setup, the release manifest in a CI/CD pipeline, the feature-flag ruleset, the IaC state file.
- Observed state — a continuously-refreshed view of what
production actually looks like. Sources: running process
versions, file hashes, K8s
kubectl get, cloud APIdescribe-*calls, feature-flag evaluation audit logs, DB schema introspection. - Differ / reconciler — a service that compares (1) and (2) on a schedule and emits alerts / metrics / tickets when they diverge.
- Response — human triage or automated reconciliation: either roll the observed state back to match declared (if the change was illegitimate), or fold the observed state into declared (if the change was legitimate but captured through the wrong channel).
Surface-by-surface examples¶
| Declared state source | Observed state source | Comparison tool |
|---|---|---|
| Git (K8s manifests) | kubectl / API server |
Argo CD, Flux |
| Terraform / CDK state | Cloud provider API | terraform plan, drift-ctl |
| Ansible / Puppet config | Node file content | Tripwire, AIDE, Ansible --check |
| Docker image manifest | Running container digest | Kubernetes admission controllers, image-policy webhooks |
| Feature-flag ruleset | Flag evaluation audit log | See concepts/audit-trail, systems/cloudflare-flagship |
| Database schema migrations | pg_dump / mongodump introspection |
sqitch verify, custom diff scripts |
| Cloud firewall rules | AWS Config / GCP Asset Inventory | Native drift detection |
Each row is the same structural pattern — declared vs. observed, continuously compared.
Why it's harder than it looks¶
- Cardinality: a mid-sized org has O(10K–1M) comparable production objects. A differ that alerts on every small timestamp / cache-line variation produces a flood.
- Legitimate transient changes: container restarts, autoscaled instances, log-rotation, config distribution in flight. The differ must distinguish these from "real" drift.
- Authoritative state gaps: many ops surfaces (console edits, shell access, SaaS vendor UIs) don't have a declared-state record to diff against. The differ can only detect drift on surfaces wired in.
- Feedback loops with GitOps: if the declared state is generated from an automated process that itself reads production, feedback amplifies noise.
Why the Swedbank shape justifies the pattern¶
The regulator's finding was that "none of the bank's control mechanisms were able to capture the deviation and ensure that the process was followed." Any competent runtime-change-detection system — monitoring of core-banking-system binary hashes, configuration file checksums, or database DDL operation logs — would have caught the unapproved change before it reached customer-visible balance corruption. The SEK 850M fine is the cost of not having had one. Knight Capital (2012, see systems/knight-capital-smars) is the other commonly-cited case where runtime diff against declared deployment would have caught the partial-deployment drift before it became a $460M trading loss.
Relationship to observability¶
Runtime change detection is a specialization of concepts/observability — it is monitoring specifically of configuration / code / schema state, not of behavioural metrics (latency, error rate, throughput). A mature production environment runs both:
- Behavioural observability — metrics/logs/traces from the application, driving MTTD/MTTR for functional incidents.
- Structural observability — change-detection at the state level, driving MTTD for governance violations and latent latent misconfigurations.
Both feed the same incident-response workflow, and can share an concepts/audit-trail.
Commercial landscape (as of 2023-08)¶
- GitOps: Argo CD, Flux (Kubernetes drift).
- Cloud drift: AWS Config, GCP Security Command Center, Azure Policy.
- Change-evidence / compliance: Kosli (author of the source post), JFrog Xray's policy engine, PagerDuty's change events feed.
- File-integrity: Tripwire, AIDE, OSQuery, Wazuh.