Skip to content

CONCEPT Cited by 1 source

Silent wiring failure

Definition

Silent wiring failure is the bug class in which a producer publishes to topic A and a consumer subscribes to topic A' where A and A' are both well-formed and both accepted by the broker, but they are not the same topic. "Both services started cleanly, both topic names were accepted, and nothing failed — yet the routing pipeline silently stopped working."

The defining property: the silence is the failure mode. There is no exception, no error log, no failed health check at startup. The producer's writes succeed; the consumer's read poll returns empty; the system gives every appearance of working. The bug manifests downstream as events not happening that should — work items not picked up, pipelines not advancing, completion events never arriving.

Disclosed shape

The OmniNode source (sources/2026-06-02-redpanda-how-omninode-uses-redpanda-to-scale-ai-agent-workflows) canonicalises the bug class with a concrete example:

Producer publishes to: onex.evt.router.routing-complete.v1 Consumer subscribes to: onex.evt.router.routing_complete.v1

A hyphen vs an underscore. The broker accepts both names as legal topics; the producer creates one and writes to it, the consumer auto-creates (or just polls) the other and reads nothing. Five canonical sub-shapes named in the post:

  1. Pluralization differencesorder-events vs orders-events.
  2. Underscores versus hyphensrouting_complete vs routing-complete.
  3. Version-suffix mismatches.v1 vs .v2, or .v1 vs no suffix at all.
  4. Renamed event segments — partial rename where one repo got the new name and another didn't.
  5. Old topics left behind after refactors — producer renamed, consumer still listening on the old name. (Or vice versa: a newly-renamed consumer subscribed but no producer is publishing to the new name.)

The shape: both names are well-formed and both operations succeed.

Why classical observability misses it

Every signal a streaming-broker dashboard typically surfaces shows "green": - Producer's produce calls succeed (returns ACK). - Consumer's subscribe call succeeds. - Consumer's poll returns (empty result, but no error). - Broker shows the topic exists.

What's missing is the cross-pair check: does this consumer's topic match any producer's topic? That check requires either human inspection of both code paths or a registry-of-registries that knows what every node produces and consumes. The classical observability stack doesn't have that registry by default.

Why the OmniNode contract discipline addresses it

The fix is not better observability — it's eliminating the opportunity to drift. The OmniNode response makes the topic name appear in exactly one reviewed place (the node's contract.yaml) and validates it mechanically against a regex (catches malformed) and an enum (catches non-canonical). With contract-driven provisioning + three-call-site enforcement (patterns/single-extractor-multi-call-site), a topic name "can only be wrong in the contract" — and the contract is reviewed at PR time. See patterns/contract-driven-topic-provisioning.

The architectural insight: silent wiring failure is a special case of "two copies of a value drifting apart with no enforcement that they agree." Eliminate the second copy and you eliminate the bug class. The OmniNode discipline: "there is no second operator- maintained registry, separate constant list hidden inside the runtime, or manually synchronized provisioning config."

Sibling silent-failure framings on the wiki

  • concepts/innodb-silent-cascade-in-binlog — InnoDB cascade deletes don't appear in the binlog as cascades; downstream CDC consumers see only the parent delete and miss the children. Same shape: both sides succeed, asymmetric visibility leaves consumers wrong without an error.
  • concepts/silent-hang-llm-server — Databricks LLM-serving silent hang from container OMP_NUM_THREADS misconfiguration. Same shape: nothing crashes, requests just stop completing.
  • concepts/torn-page — Postgres half-written pages on partial-write failure. Same shape: storage looks healthy, content is wrong.

The unifying property across all of these: the system is in a broken state but every local-success signal is green, because the broken-ness is a relationship between components rather than a state of any one component.

Seen in

  • sources/2026-06-02-redpanda-how-omninode-uses-redpanda-to-scale-ai-agent-workflows (2026-06-02, OmniNode founder Jonah Gray on Redpanda Blog) — canonical disclosure source. Provides the named bug class, the concrete routing-complete vs routing_complete example, and the five sub-shapes (pluralization / underscore-vs-hyphen / version-suffix / renamed-segment / orphaned-after-refactor). Explicit framing: "The silence was the failure mode." Solution framing: "We realized this was a naming problem before it was a schema problem." The OmniNode response — eliminating the duplicated-state opportunity rather than detecting drift after the fact — is the canonical architectural answer for this bug class.
Last updated · 542 distilled / 1,571 read