CONCEPT Cited by 1 source

Surface attribution error¶

Definition¶

Surface attribution error is a named LLM failure mode where the model "is making decisions based on surface-level clues rather than deeper meaning or causality" — it attributes a cause, role, or relationship to an entity because that entity's name appears in the input, not because the entity is actually causally involved (Source: sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysis).

Zalando's framing, verbatim:

"The model makes a bias to prominent keywords staging on the surface-level instead of reasoning through context to identify the actual causal factor. For instance, it could offer a well-structured and authoritative explanation regarding the contribution of AWS S3 to an incident, even if 'S3' is merely mentioned without being causally linked."

Why it's distinct from hallucination¶

Hallucination is the general failure to produce factually correct output. Surface attribution error is a specific structural sub-class:

The claim the model makes is well-structured and authoritative.
Every entity the model names is actually present in the input.
The relationship the model asserts between the entities is fabricated — no causal, contributory, or role-bearing relationship exists in the source.

It sits alongside other hallucination sub-classes in the wiki taxonomy: training-cutoff dynamism gap (parametric staleness), library-API icon hallucination (symbols that don't exist in a churning namespace), and lost in the middle (information drops from long prompts). Surface attribution error is the failure mode of reading effort — the model didn't fail to see the input; it failed to reason causally over the input.

Canonical worked example (Zalando post)¶

A postmortem mentions "an AWS S3 bucket used for log archival" in its Impact section while the actual root cause is a Kubernetes DNS outage. The Classification stage of the Zalando pipeline "could offer a well-structured and authoritative explanation regarding the contribution of AWS S3 to an incident, even if 'S3' is merely mentioned without being causally linked" — producing a false-positive classification of the incident as an S3 incident, which then propagates into the Patterns stage as a spurious "recurring S3 issue" theme.

This is structurally misleading: the downstream Patterns stage has no way to detect the bad attribution without revisiting the source postmortem, and the end-of-pipeline Opportunity stage might then recommend investment in S3 reliability when the actual prevent-this-incident investment would be in DNS.

Persistence across model scales¶

Zalando disclose this failure mode is not fully solved at frontier scale:

Small models (3B–12B params): root-cause of ~40% of early hallucinations, mitigated to <15% by prompt hardening + curation.
Claude Sonnet 4 on AWS Bedrock: "approximately 10% attribution, even with advanced models such as Claude Sonnet 4."

The 10% residual rate is the canonical wiki datum that surface attribution is a structural reasoning limitation, not a model-scale problem — it doesn't disappear with capability, it has to be engineered around.

Mitigation¶

Zalando's stack, per the post:

Negative prompting. The Classification stage prompt "provides negative examples" targeting specifically the surface-attribution failure mode: if a technology appears but isn't causally linked, the model must return None rather than tag the technology. Canonicalised as patterns/negative-example-prompting.
Strict instruction to refuse-on-ambiguity. The Summarization stage prompt requires "no guessing, no assumptions, and no speculative content. If something in the original postmortem is unclear or missing, the summary explicitly states that."
Per-stage human curation of the digest. The Analyzer stage's 3–5-sentence digest is the pivotal point because it's compact enough for a human to cross-check against the original postmortem. "The pivotal role of digests allowed humans to observe all incidents as a whole and precisely validate and curate the reports produced by LLMs."
Human proofreading at the Patterns-stage output. Even at 10–20% random-sample curation, Zalando keeps human proofreading of the final one-pager report as a non-negotiable gate because surface-attribution errors "pass through" all automated stages.

None of these eliminate the 10% residual — they surface it in a human-inspectable form before it compounds.

Relation to reasoning capability¶

The post frames surface attribution error as dimensionally distinct from ordinary recall-style hallucination:

"Surface Attribution Error often accompanies overfitting, since both involve relying on superficial patterns from past data rather than deeper, more reliable signals. General purpose LLMs are trained on publicly available data, and struggle to identify emerging failure patterns that haven't been seen before."

Two implications:

Dataset shift amplifies the error. The model's attribution prior is shaped by the training distribution; on novel failure patterns Zalando's systems hadn't seen in public discourse, the model falls back to surface cues more readily.
Fine-tuning is the proposed remediation axis for domain-specific surface-attribution. Zalando explicitly names fine-tuning as the roadmap for handling their internal technologies (e.g. Skipper) where base models over-attribute to surface name-matches against public namespaces.

Seen in¶

sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysis — canonical wiki instance. Zalando's datastore SRE team names surface attribution error explicitly, quantifies it at ~10% even on Claude Sonnet 4, catalogues the three-part mitigation stack (negative prompting, strict refusal-on-ambiguity, per-stage human curation of digests), and identifies fine-tuning as the unshipped roadmap remediation for domain-specific cases.

concepts/llm-hallucination — the parent failure-mode category.
concepts/lost-in-the-middle-effect — sibling long-context failure mode.
patterns/negative-example-prompting — the canonical mitigation technique.
patterns/multi-stage-llm-pipeline-over-large-context — the pipeline architecture that surfaces attribution errors in a human-inspectable form.
patterns/human-in-the-loop-quality-sampling — the sampling discipline that catches attribution-error regressions at production scale.
systems/zalando-postmortem-analysis-pipeline — canonical production system.
systems/claude-sonnet-4 — the frontier-tier model at which this failure still shows up at ~10%.