Skip to content

CONCEPT Cited by 1 source

Parser differential

A parser differential is a security-relevant disagreement between two implementations of the same parsing contract: given the same input bytes, parser A and parser B produce different parse trees, different extracted values, or different semantic decisions. When a security-critical decision depends on more than one parser agreeing — explicitly or implicitly — the differential becomes exploitable.

The concept is broader than one file format. The canonical survey is A Survey of Parser Differential Anti-Patterns by Ali and Smith (LangSec 2023), which catalogues real-world parser differentials by root cause.

Canonical vulnerability families

Surface Attack enabled How the differential is structured
XML XML signature wrapping, SAML / XML-DSig bypass Two XML parsers disagree on which <Signature> element an XPath resolves to
URL parsing Server-Side Request Forgery (SSRF) Validator parses a URL one way (host = trusted.com); HTTP client parses it another way (host = attacker.com)
HTTP framing Request smuggling Front-end proxy uses Content-Length; back-end uses Transfer-Encoding (or vice versa)
File format Polyglot uploads, AV evasion Antivirus parses as one format; consumer parses as another
JSON Auth bypass when server validates one JSON parse but acts on another Duplicate keys, number/string coercion, UTF-8 normalisation

In every case the structural shape is identical: two parsers, same bytes, different answers, security decision straddles the gap.

Why it happens

  • Protocol complexity — XML, URLs, and HTTP have legacy / ambiguous grammars with implementation leeway.
  • Language mismatch — ruby-saml added systems/nokogiri for canonicalisation that its original systems/rexml didn't support, without removing REXML; now both run over the same input.
  • Operational contracts diverge — one parser might be hardened / strict while the other accepts malformed input silently, so some inputs parse in one and not the other.
  • Unspecified behaviour — the spec leaves a case undefined, parsers make different choices, the security layer didn't notice.

Detection signals (during code review)

The ruby-saml disclosure names a clear review-time tell: "REXML methods are prefixed with REXML::, whereas Nokogiri methods are called on document." Generalisable heuristics:

  1. Two imports of different parser libraries for the same format in the same module.
  2. Two independent extractions of what should be "the same element" — each call returns its parser's view of the input.
  3. A security decision that combines outputs from different parsers (e.g. signature from parser A, hashed bytes from parser B).
  4. Asymmetric error handling — one parser raises, the other silently annotates errors on a member (Nokogiri doc.errors).

Structural mitigation

The patterns/single-parser-for-security-boundaries pattern: one parser from authenticated bytes to enforced decision, no re-querying. Once a range of bytes has been cryptographically verified, subsequent pieces of that decision must be extracted from that same byte range by the same parser — not recovered by re-parsing the document.

Weaker mitigations exist and sometimes are all you can ship:

  • Strict mode + error checks on every parser — closes some differentials (silent-error classes) but not the general differential class; see ruby-saml: Nokogiri strict mode stops one exploit but leaves the underlying class unfixed.
  • Fuzzer-driven differential testing — feed the same input through both parsers in CI and compare; useful for regression prevention but not a complete defence.

Seen in

  • sources/2025-03-15-github-sign-in-as-anyone-bypassing-saml-sso-authentication-with-parser-differentials — Canonical case. ruby-saml uses systems/rexml to locate the <Signature> element and systems/nokogiri to canonicalise <SignedInfo> for signature verification. Attacker crafts input where REXML and Nokogiri each return a different <Signature>, one of which is valid against a captured IdP signature and the other of which contains a <DigestValue> for an attacker-fabricated assertion. Signature check passes (against parser A's view); digest check passes (against parser B's view); authentication bypassed despite both checks passing. Two independent researchers each found exploitable differentials within days (one via Mattermost-style XML roundtrips, one via ruzzy fuzzing).
Last updated · 200 distilled / 1,178 read