PATTERN Cited by 1 source
LLM PR code review¶
LLM PR code review is the pattern of running every incoming pull request through a two-stage LLM classifier that decides whether the change is benign or malicious and outputs a structured rationale, with verdicts routed into a traditional security-incident pipeline ( SIEM → detection rule → case → incident).
The canonical wiki instance is BewAIre, Datadog's in-house system running at ≈10,000 PRs/week across internal + external repositories, and its first publicly-disclosed production catch — the 2026-02-27 hackerbot-claw campaign — is documented in sources/2026-03-09-datadog-when-an-ai-agent-came-knocking.
Shape¶
GitHub events → filter (security-relevant triggers: PRs, pushes)
→ diff extract + normalize + enrich
→ LLM classifier (two stages)
→ verdict {benign | malicious, rationale}
→ SIEM ingest → detection rule → signal → SIRT case
→ (escalate) → incident
Two stages explicitly named in Datadog's 2026-03-09 post. The specific prompts, models, evaluation methodology, and false-positive rates are not publicly disclosed.
Why two stages¶
The exact rationale is not in the Datadog post, but typical justifications for two-stage classifier designs:
- Stage 1: cheap/fast filter routes out obviously-benign changes (formatting, deps bumps, small-diff whitelisted files) to save cost per PR.
- Stage 2: deeper review with more context / larger model / structured output on the reduced candidate set.
This mirrors patterns/two-stage-evaluation at a different workload — the general shape of "cheap O(1) filter protecting a rich decision stage."
Production latency disclosed¶
Datadog's first production alert on 2026-02-27 fired ~33 seconds after the malicious PR opened (05:26:25 → 05:26:58). This is the only latency datum in the source.
Related but distinct patterns¶
- patterns/coordinator-sub-reviewer-orchestration — the Cloudflare AI Code Review pattern is a peer review + coordinator-driven multi-agent design, not a binary malicious/benign classifier. LLM PR code review is narrower — detection, not review.
- patterns/ai-review-risk-tiering — applies to gating which reviewers get involved; complementary, operates at a different abstraction.
- patterns/specialized-reviewer-agents — fanning out to domain-specific reviewers; orthogonal concern.
Caveats¶
- FP / FN rates not published — Datadog discloses only the two confirmed true positives. Whether analysts drown in false positives at 10K-weekly-PR scale is unanswered.
- Classifier prompts not published — which limits reproducibility but matches standard practice for security classifiers (publishing the prompts gives attackers targeting guidance — see the hackerbot-claw system-prompt-knowledge observation under concepts/autonomous-attack-agent).
- Model cost at 10K-PR/week volume not disclosed — commercial sensitivity.
- No comparison to non-LLM baselines — e.g., regex + static-analysis rules — to quantify the LLM-specific lift.
Seen in¶
- sources/2026-03-09-datadog-when-an-ai-agent-came-knocking — canonical disclosure (BewAIre + Cloud SIEM pipeline + hackerbot-claw catch).
Related¶
- systems/bewaire — canonical instance.
- systems/github-actions — subset of the monitored event stream.
- concepts/prompt-injection, concepts/github-actions-script-injection — primary attack classes this pattern is designed to catch.
- concepts/autonomous-attack-agent — the adversary class driving the economic case for LLM-scale defence.
- companies/datadog — operator.