CONCEPT Cited by 1 source

Error propagation¶

Error propagation is the deterministic downstream consequence of an upstream-stage mistake in a multi-stage pipeline, when no later stage has the ability to re-examine the original input and recover. Once an error is committed at stage A and A's output is the only input stage B ever sees, stage B's best case is to reason correctly given the wrong input — and its result is consequently wrong at the pipeline's end.

The canonical wiki-level instance is Google Research's Speech-to-Retrieval post (2025-10-07), which names it directly as one of the two structural failure modes of the cascade ASR → text retrieval voice-search architecture:

"If the system misinterprets the audio early on, that error is passed along to the search engine, which typically lacks the ability to correct it (i.e., error propagation). As a result, the final search result may not reflect the user's intent." (Source: sources/2025-10-07-google-speech-to-retrieval-s2r-voice-search)

Structural conditions¶

Error propagation becomes a structural (not incidental) feature of a pipeline when two conditions both hold:

The upstream stage output is the only input to the downstream stage. The downstream stage never sees the original audio / image / raw input; it receives only the upstream stage's post-processed version.
The post-processing at the upstream stage is lossy. In principle, if the downstream stage received both the upstream output and the raw input, it could cross-check and recover. But a lossy intermediate (see intermediate representation bottleneck) forecloses this.

In voice search's cascade ASR, both conditions hold: the text retriever sees only the ASR transcript, and the transcript is single-hypothesis lossy text derived from information-rich audio.

Mechanisms of propagation¶

Deterministic degradation: downstream stage works correctly given the wrong input; produces wrong result. No fallback, no retry, no secondary signal.
Amplification: a small upstream error (one word wrong) can produce a large downstream error (completely wrong document returned — homophones "flour" vs "flower" demonstrating that 1-word ASR error = disjoint retrieval corpus).
Silent failure: downstream stage has no signal that its input was incorrect. From the retriever's perspective the query is just a query — it cannot distinguish a user who asked for "flower" from an ASR mistranscription of "flour."

Design responses¶

Preserve signal the downstream can use to validate. Pass N-best hypotheses instead of top-1, pass confidence scores, pass acoustic features alongside the text. Each weakens the "only input" premise in condition 1 above.
Allow the downstream stage to talk back. Re-interaction loops where a low-confidence downstream response triggers re-examination of the upstream — expensive, adds latency, rarely deployed at query-serving scale.
Replace the cascade with a direct pipeline. If the upstream stage's lossy intermediate is the structural problem, remove the intermediate: let the downstream stage consume the raw input directly. This is the pattern S2R instantiates for voice search — audio → retrieval, no cascade boundary, no transcript, no propagation path for an ASR error because there's no ASR stage to err.

Where this shows up beyond voice search¶

Error propagation through a lossy intermediate is a general multi-stage-pipeline shape; some instances:

OCR → downstream text task — OCR misrecognition of a digit breaks every downstream calculation no matter how correct the calculator is.
Named-entity recognition → knowledge-base lookup — misidentified entity → wrong KB entry → wrong answer.
Tokenization → model inference — for certain languages or character sets, the tokenizer's segmentation choices commit semantics the model can't unmake.
Compile-time → runtime — a type-erased generic at a compile stage loses type info that the runtime can't reconstruct; reflection-based workarounds pay a large overhead to partly recover.

Distinguishing from adjacent concepts¶

concepts/intermediate-representation-bottleneck — bottleneck is about the expressivity of the upstream stage's output format; propagation is about the correctness of the specific output it produced. A perfect stage A over a lossy intermediate still has the bottleneck but not propagation (Cascade groundtruth in Google's S2R benchmark). A real stage A over a lossless intermediate still has propagation if it makes mistakes. A real stage A over a lossy intermediate has both simultaneously (Cascade ASR in the same benchmark).
Fault propagation in distributed systems — different beast: that's about availability and partial failure, not information-content correctness. The structural similarity is "one component's state is the only thing downstream components see," but failure-mode shapes differ.

Seen in¶

sources/2025-10-07-google-speech-to-retrieval-s2r-voice-search — canonical wiki instance; Google Research explicitly names "error propagation" as one of the two motivating failure modes of the cascade ASR → text retrieval voice-search architecture.