Skip to content

PATTERN Cited by 1 source

Negative example prompting

Intent

Include explicit anti-examples in an LLM prompt — examples of the wrong pattern, paired with the correct rejection — to suppress failure modes where positive examples alone aren't sufficient. Particularly effective against surface attribution error and over-inference, where the model's default behaviour is to confidently produce plausible-but-wrong output that a purely-positive-example prompt wouldn't rule out.

Canonicalised by Zalando's 2025-09-24 postmortem analysis pipeline, where the Classification stage prompt carries explicit negative examples:

"Surface Attribution Error was an obstacle for our solution. We have to strictly prohibit inference or assumption, ensuring that only explicitly stated connections are flagged. Additionally, the prompt provides negative examples." (Source: sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysis)

When to use

  • Output is easy to get wrong in a specific characterisable way. Negative examples are most useful when the failure mode is nameable (surface attribution, over-inference, hallucinated field, category-adjacent false positive).
  • Positive examples alone underconstrain the output. If you show the model N ways to do it right and the model still produces a different wrong-way-that-looks-plausible, you need to explicitly show the wrong-way and reject it.
  • Binary decisions with a skewed prior. The Classification stage's output is name-of-tech or None — the negative example teaches the model when None is the right answer.

When NOT to use

  • Generative / open-ended tasks where there isn't a characterisable wrong pattern. Negative examples are specific failure modes, not a general hedge.
  • When positive examples already dominate the failure mode. Adding negative examples costs prompt tokens; if the model already does the right thing, don't.
  • When failure modes are combinatorially large. Negative-example prompting targets named failure modes; it doesn't scale to arbitrary wrongness.

Structure

Typical negative-example block inside a prompt:

**Examples of what NOT to do:**

Postmortem excerpt: "The incident caused degraded read
latency for downstream services including our S3-backed
log archival path and our Redis cache."

WRONG classification: ["aws-s3", "redis"]
Reason: S3 and Redis are mentioned but neither is causally
linked to the incident's root cause — they are downstream
affected systems, not contributors.

CORRECT classification: "None" (unless the post's root-cause
section explicitly names a datastore technology).

The key elements:

  • A realistic input. Drawn from the actual domain, not contrived.
  • The wrong output explicitly shown. Not just described.
  • The reason the wrong output is wrong. Explicitly states the failure mode (surface attribution, over- inference, etc.).
  • The correct output for the same input. Paired so the model learns the discrimination.

Pairs with

  • TELeR — negative examples live in the Level-of-details axis. A TELeR-maximal prompt has both positive examples and negative examples in the examples block.
  • Strict refusal-on-ambiguity. The Summarization stage's "no guessing, no assumptions, and no speculative content" instruction works with negative examples: the instruction tells the model when to refuse, the negative examples show the model what refusal cases look like.
  • HITL sampling. Curators who flag incorrect outputs during development often supply the source material for new negative examples. The curation feedback loop writes the prompt.

Consequences

  • Prompt length grows. Each negative-example block is ~10× the size of the corresponding positive-example block (because it includes the wrong output, the reason, and the correct output). This is real cost per invocation.
  • Model tier matters. Smaller models benefit more from negative examples (they're closer to the failure mode). Frontier models benefit less-but-still-non-trivially — Zalando's 10% residual surface-attribution rate on Claude Sonnet 4 is after negative-example prompting.
  • Negative examples are brittle to domain shift. An example that was realistic last year may not be realistic this year if the underlying system evolved. Treat negative examples as maintained artefacts, not one-off constants.
  • Can over-suppress. Aggressive negative-example prompting can make the model refuse in cases where the correct answer is to not refuse. Monitor the false- negative rate via HITL sampling.

Known uses

  • systems/zalando-postmortem-analysis-pipelinecanonical. Classification stage uses negative examples to suppress surface attribution errors where the model would otherwise classify a postmortem as, e.g., an S3 incident because S3 is mentioned in the impact section without being causally linked to the root cause.
  • Adjacent wiki instances of related prompting disciplines:
  • patterns/in-code-annotation-as-llm-guidance — Slack's Enzyme-to-RTL conversion uses per-file-in-code annotations to steer the model away from specific anti-patterns. Structurally similar "tell the model what not to do" pattern, expressed in code comments rather than prompt examples.
  • patterns/deterministic-plus-model-autofixer — Vercel v0's post-generation rewrite catches specific LLM failure modes (icon hallucination) that negative- example prompting alone couldn't eliminate.
Last updated · 507 distilled / 1,218 read