Skip to content

PATTERN Cited by 1 source

Untrusted input via file, not prompt

Problem: LLM-powered CI actions ( anthropics/claude-code-action and similar) assemble their prompt from attacker-controllable fields (issue body, PR title, diff content). Interpolating that content directly into the system prompt gives the attacker a first-class instruction channel — prompt injection.

Pattern: Write untrusted data to a file. Then instruct the LLM to read the file.

Shape

From Datadog's 2026-03-09 reference workflow:

- name: Write PR details to file
  env:
    PR_TITLE: ${{ github.event.pull_request.title }}
    PR_NUMBER: ${{ github.event.pull_request.number }}
  run: |
    jq -n \
      --arg title "$PR_TITLE" \
      --argjson number "$PR_NUMBER" \
      '{pr_number: $number, pr_title: $title}' > pr.json

- name: Categorize PR with Claude
  uses: anthropics/claude-code-action@v1
  with:
    prompt: |
      Read pr.json to get the PR title.
      Categorize the PR into exactly ONE of: new-feature,
      bug-fix, documentation.
      Write only the category (nothing else) to category.txt.
    claude_args: "--allowedTools 'Read(./pr.json),Edit(./category.txt)'"

Untrusted data (PR_TITLE, PR_NUMBER) is written to pr.json via an env-var interpolated bash step — both script injection (jq -n --arg is safe) and prompt injection defences compose here. Claude is told to read the file rather than having the data pasted into its prompt.

Why it helps

Prompt injection depends on the attacker's text being interpreted as instructions by the LLM. When the LLM reads data from a file, modern models have a stronger prior that "file content = data" than "content that follows 'here is the PR title:' in my system prompt = data." The signal-to-noise of the injection attempt drops — though this is mitigation, not elimination: a sufficiently persuasive injection inside a file read can still succeed.

Compose with siblings

This pattern is one of five defensive patterns enumerated in Datadog's 2026-03-09 post:

  1. Use recent models (lower per-attempt success rates).
  2. Untrusted input via file, not prompt — this pattern.
  3. patterns/llm-output-as-untrusted-input — sanitize the LLM's output before use.
  4. patterns/minimally-scoped-llm-tools — constrain the LLM's tool surface.
  5. Keep sensitive secrets out of the LLM step's environment.

All five compose into the canonical reference workflow on Datadog's 2026-03-09 post; each one alone is a partial defence.

Seen in

Last updated · 200 distilled / 1,178 read