Skip to content

CONCEPT Cited by 2 sources

Parser differential

A parser differential is a security-relevant disagreement between two implementations of the same parsing contract: given the same input bytes, parser A and parser B produce different parse trees, different extracted values, or different semantic decisions. When a security-critical decision depends on more than one parser agreeing — explicitly or implicitly — the differential becomes exploitable.

The concept is broader than one file format. The canonical survey is A Survey of Parser Differential Anti-Patterns by Ali and Smith (LangSec 2023), which catalogues real-world parser differentials by root cause.

Canonical vulnerability families

Surface Attack enabled How the differential is structured
XML XML signature wrapping, SAML / XML-DSig bypass Two XML parsers disagree on which <Signature> element an XPath resolves to
URL parsing Server-Side Request Forgery (SSRF) Validator parses a URL one way (host = trusted.com); HTTP client parses it another way (host = attacker.com)
HTTP framing Request smuggling Front-end proxy uses Content-Length; back-end uses Transfer-Encoding (or vice versa)
File format Polyglot uploads, AV evasion Antivirus parses as one format; consumer parses as another
JSON Auth bypass when server validates one JSON parse but acts on another Duplicate keys, number/string coercion, UTF-8 normalisation

In every case the structural shape is identical: two parsers, same bytes, different answers, security decision straddles the gap.

Why it happens

  • Protocol complexity — XML, URLs, and HTTP have legacy / ambiguous grammars with implementation leeway.
  • Language mismatch — ruby-saml added systems/nokogiri for canonicalisation that its original systems/rexml didn't support, without removing REXML; now both run over the same input.
  • Operational contracts diverge — one parser might be hardened / strict while the other accepts malformed input silently, so some inputs parse in one and not the other.
  • Unspecified behaviour — the spec leaves a case undefined, parsers make different choices, the security layer didn't notice.

Detection signals (during code review)

The ruby-saml disclosure names a clear review-time tell: "REXML methods are prefixed with REXML::, whereas Nokogiri methods are called on document." Generalisable heuristics:

  1. Two imports of different parser libraries for the same format in the same module.
  2. Two independent extractions of what should be "the same element" — each call returns its parser's view of the input.
  3. A security decision that combines outputs from different parsers (e.g. signature from parser A, hashed bytes from parser B).
  4. Asymmetric error handling — one parser raises, the other silently annotates errors on a member (Nokogiri doc.errors).

Structural mitigation

The patterns/single-parser-for-security-boundaries pattern: one parser from authenticated bytes to enforced decision, no re-querying. Once a range of bytes has been cryptographically verified, subsequent pieces of that decision must be extracted from that same byte range by the same parser — not recovered by re-parsing the document.

Weaker mitigations exist and sometimes are all you can ship:

  • Strict mode + error checks on every parser — closes some differentials (silent-error classes) but not the general differential class; see ruby-saml: Nokogiri strict mode stops one exploit but leaves the underlying class unfixed.
  • Fuzzer-driven differential testing — feed the same input through both parsers in CI and compare; useful for regression prevention but not a complete defence.

Seen in

  • sources/2025-03-15-github-sign-in-as-anyone-bypassing-saml-sso-authentication-with-parser-differentials — Canonical case. ruby-saml uses systems/rexml to locate the <Signature> element and systems/nokogiri to canonicalise <SignedInfo> for signature verification. Attacker crafts input where REXML and Nokogiri each return a different <Signature>, one of which is valid against a captured IdP signature and the other of which contains a <DigestValue> for an attacker-fabricated assertion. Signature check passes (against parser A's view); digest check passes (against parser B's view); authentication bypassed despite both checks passing. Two independent researchers each found exploitable differentials within days (one via Mattermost-style XML roundtrips, one via ruzzy fuzzing).
  • sources/2026-01-28-meta-rust-at-scale-an-added-layer-of-security-for-whatsappMedia-file / OS-library variant of the attack class, plus the canonical app-layer defensive posture. WhatsApp's wamedia library (now Rust) parses MP4s before handing them off to OS-provided media parsers, which may be unpatched (concepts/os-library-vulnerability-ungovernable). The attack shape: attacker crafts an MP4 that wamedia accepts but that triggers a bug in a downstream OS media library — "detect files which do not adhere to the MP4 standard and might trigger bugs in a vulnerable OS library on the receiver side." Meta's mitigation is the patterns/format-aware-malware-check-before-os-handoff pattern plus the risk-indicator / spoof / dangerous-type check ensemble ( Kaleidoscope) — block the divergent input at the app layer before it reaches the ungovernable parser. Reinforces that parser-differential defense has two distinct postures: the ruby-saml case uses "one parser for security boundaries"; the WhatsApp case uses "one parser in front of many ungovernable parsers, reject divergent inputs".
Last updated · 542 distilled / 1,571 read