Skip to content

CONCEPT Cited by 1 source

TELeR prompt framework

Definition

TELeR is a prompt-engineering taxonomy that decomposes a prompt along four axes — Turn, Expression, Level of details, and Role — for systematic comparison of complex LLM prompting strategies. Originally proposed in the 2023 EMNLP paper "TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks" (aclanthology.org/2023.findings-emnlp.946).

Zalando's postmortem-analysis pipeline names TELeR explicitly as the prompt technique used at the Summarization stage:

"Using a tightly scoped prompt, we have used Turn, Expression, Level of Details, Role (TELeR) techniques for prompt engineering, LLM processes each postmortem document and extracts only the most essential information across five core dimensions." (Source: sources/2025-09-24-zalando-dead-ends-or-data-goldmines-ai-powered-postmortem-analysis)

The four axes

  • Turn — single-turn vs multi-turn. Whether the prompt operates in a single LLM invocation or expects a conversational interaction with the model.
  • Expression — natural language vs explicitly structured form (bullet points, JSON schema, pseudo-code). How the task is expressed to the model.
  • Level of details — how much context, examples, and instruction the prompt carries. TELeR identifies a graded scale from minimal (just the task) to maximally detailed (task + system role + constraints + positive examples + negative examples + output format + style guidance).
  • Role — whether the prompt assigns the model an explicit identity/persona ("you are an SRE analyst") or not.

Canonical Zalando usage at Summarization stage

Zalando's Summarization-stage prompt is a TELeR-maximal configuration along all four axes:

  • Turn: single-turn. Each postmortem is processed in one LLM invocation; no iterative chat.
  • Expression: explicitly structured. Output is fixed to five named fields — Issue Summary / Root Causes / Impact / Resolution / Preventive Actions. The prompt constrains the model to emit those fields in that order.
  • Level of details: maximal. The prompt carries:
  • Task definition (condense a postmortem).
  • Field schema (the five core dimensions).
  • Constraints ("no guessing, no assumptions, and no speculative content").
  • Refusal-on-ambiguity rule ("if something in the original postmortem is unclear or missing, the summary explicitly states that").
  • Anti-examples (implicit — the output is not supposed to contain speculation, redundant phrasing, or tangential commentary).
  • Role: implied engineering-audience. "What's preserved are the key technical and operational insights—delivered in a readable, structured format. This makes the output especially valuable for engineering leadership, reliability teams, and cross-functional reviews."

The TELeR-maximal choice is deliberate: the post frames it as the mechanism for accuracy and trust at scale over thousands of postmortems.

Why TELeR matters as a framework, not just a checklist

Two structural reasons the taxonomy is load-bearing rather than ornamental:

  • Reproducibility of prompt experiments. Naming the four axes lets teams compare prompt variants systematically: "this prompt differs only on Level-of-details", rather than describing a wall of text. Pairs naturally with negative-example prompting as a specific Level-of-details intensification.
  • Stage-appropriate prompt strength. Not every stage in a pipeline wants maximum TELeR detail. A Classification stage with a tiny single-word output benefits from a short prompt with negative examples; a Summarization stage synthesising a structured artefact benefits from TELeR-maximal specification. Zalando's pipeline appears to scale TELeR intensity down through Classification and up through Summarization / Analyzer, though only Summarization is explicitly labelled as TELeR-shaped.

Tradeoffs / gotchas

  • Maximal TELeR ≠ best output. Over-specifying a prompt can over-constrain the model — narrow output channels miss legitimate-but-off-format content. Zalando ran 100% human curation during development specifically to tune TELeR-intensity stage by stage.
  • Anti-examples are often skipped. Surface-attribution errors aren't addressed by positive examples of correct output; they're suppressed by negative examples showing the wrong pattern and rejecting it. TELeR-taxonomy-aware prompts should explicitly carry the anti-example section.
  • Role framing can amplify confidence bias. Assigning "you are an expert X" tends to make the model more authoritative, which is the opposite of what a refuse-on-ambiguity pipeline wants. Zalando's implicit- engineering-audience frame sidesteps this by positioning the reader as the expert rather than the model.

Seen in

Last updated · 507 distilled / 1,218 read