PATTERN Cited by 1 source
LLM output as untrusted input¶
Problem: An LLM exposed to attacker-controllable input (even well-isolated via patterns/untrusted-input-via-file-not-prompt) can still produce output that reflects or carries injected instructions. If downstream steps consume that output verbatim โ as a shell argument, a label name, a command, a file path โ the injection propagates.
Pattern: Treat the LLM's output as untrusted. Validate structure before using it; fail-closed on mismatch.
Shape¶
From Datadog's 2026-03-09 reference workflow:
- name: Read Claude's output
id: category
run: |
read -r CATEGORY < category.txt || true
if [[ "$CATEGORY" =~ ^(new-feature|bug-fix|documentation)$ ]]; then
echo "value=$CATEGORY" >> "$GITHUB_OUTPUT"
else
echo "::error::Unexpected category"
exit 1
fi
- name: Apply label
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CATEGORY: ${{ steps.category.outputs.value }}
run: gh pr edit "$PR_NUMBER" --add-label "kind/$CATEGORY"
Three things happening:
- Regex validator pins the output to an enumerable,
known-safe set (
^(new-feature|bug-fix|documentation)$). Fail-closed if the LLM returns anything else. - Fresh
env:interpolation on the downstream step โ the validated output goes through the env-var pattern, not string interpolation, to compose with script-injection defences. - Least-privilege secret placement โ
GH_TOKENis scoped to the step that needs it, not the job/workflow.
Why it matters¶
Datadog's framing: "Consider the LLM output as untrusted and
apply similar sanitization that you would apply for untrusted
user data such as a PR title." Concretely, if the LLM can be
induced to output a label-name like
HackerBot Claw ๐ฆ Reviewed ๐ก๏ธ (as hackerbot-claw attempted),
and your downstream step is
gh pr edit --add-label "$CATEGORY", the validator is what
stops the attack from succeeding even if the LLM itself was
compromised.
Sibling patterns¶
One of five defensive patterns from Datadog's 2026-03-09 post:
- Use recent models.
- patterns/untrusted-input-via-file-not-prompt.
- LLM output as untrusted input โ this pattern.
- patterns/minimally-scoped-llm-tools.
- Keep secrets out of the LLM's step environment.
Seen in¶
- sources/2026-03-09-datadog-when-an-ai-agent-came-knocking โ reference-workflow snippet contains the validator regex verbatim.
Related¶
- concepts/prompt-injection โ the attack class whose output-side propagation this pattern blocks.
- systems/anthropics-claude-code-action โ the target LLM action.
- patterns/untrusted-input-via-file-not-prompt, patterns/minimally-scoped-llm-tools โ sibling patterns.