Skip to content

PATTERN Cited by 1 source

LLM PR code review

LLM PR code review is the pattern of running every incoming pull request through a two-stage LLM classifier that decides whether the change is benign or malicious and outputs a structured rationale, with verdicts routed into a traditional security-incident pipeline ( SIEM → detection rule → case → incident).

The canonical wiki instance is BewAIre, Datadog's in-house system running at ≈10,000 PRs/week across internal + external repositories, and its first publicly-disclosed production catch — the 2026-02-27 hackerbot-claw campaign — is documented in sources/2026-03-09-datadog-when-an-ai-agent-came-knocking.

Shape

GitHub events → filter (security-relevant triggers: PRs, pushes)
  → diff extract + normalize + enrich
    → LLM classifier (two stages)
      → verdict {benign | malicious, rationale}
        → SIEM ingest → detection rule → signal → SIRT case
          → (escalate) → incident

Two stages explicitly named in Datadog's 2026-03-09 post. The specific prompts, models, evaluation methodology, and false-positive rates are not publicly disclosed.

Why two stages

The exact rationale is not in the Datadog post, but typical justifications for two-stage classifier designs:

  • Stage 1: cheap/fast filter routes out obviously-benign changes (formatting, deps bumps, small-diff whitelisted files) to save cost per PR.
  • Stage 2: deeper review with more context / larger model / structured output on the reduced candidate set.

This mirrors patterns/two-stage-evaluation at a different workload — the general shape of "cheap O(1) filter protecting a rich decision stage."

Production latency disclosed

Datadog's first production alert on 2026-02-27 fired ~33 seconds after the malicious PR opened (05:26:25 → 05:26:58). This is the only latency datum in the source.

Caveats

  • FP / FN rates not published — Datadog discloses only the two confirmed true positives. Whether analysts drown in false positives at 10K-weekly-PR scale is unanswered.
  • Classifier prompts not published — which limits reproducibility but matches standard practice for security classifiers (publishing the prompts gives attackers targeting guidance — see the hackerbot-claw system-prompt-knowledge observation under concepts/autonomous-attack-agent).
  • Model cost at 10K-PR/week volume not disclosed — commercial sensitivity.
  • No comparison to non-LLM baselines — e.g., regex + static-analysis rules — to quantify the LLM-specific lift.

Seen in

Last updated · 200 distilled / 1,178 read