CONCEPT Cited by 1 source
Parser differential¶
A parser differential is a security-relevant disagreement between two implementations of the same parsing contract: given the same input bytes, parser A and parser B produce different parse trees, different extracted values, or different semantic decisions. When a security-critical decision depends on more than one parser agreeing — explicitly or implicitly — the differential becomes exploitable.
The concept is broader than one file format. The canonical survey is A Survey of Parser Differential Anti-Patterns by Ali and Smith (LangSec 2023), which catalogues real-world parser differentials by root cause.
Canonical vulnerability families¶
| Surface | Attack enabled | How the differential is structured |
|---|---|---|
| XML | XML signature wrapping, SAML / XML-DSig bypass | Two XML parsers disagree on which <Signature> element an XPath resolves to |
| URL parsing | Server-Side Request Forgery (SSRF) | Validator parses a URL one way (host = trusted.com); HTTP client parses it another way (host = attacker.com) |
| HTTP framing | Request smuggling | Front-end proxy uses Content-Length; back-end uses Transfer-Encoding (or vice versa) |
| File format | Polyglot uploads, AV evasion | Antivirus parses as one format; consumer parses as another |
| JSON | Auth bypass when server validates one JSON parse but acts on another | Duplicate keys, number/string coercion, UTF-8 normalisation |
In every case the structural shape is identical: two parsers, same bytes, different answers, security decision straddles the gap.
Why it happens¶
- Protocol complexity — XML, URLs, and HTTP have legacy / ambiguous grammars with implementation leeway.
- Language mismatch — ruby-saml added systems/nokogiri for canonicalisation that its original systems/rexml didn't support, without removing REXML; now both run over the same input.
- Operational contracts diverge — one parser might be hardened / strict while the other accepts malformed input silently, so some inputs parse in one and not the other.
- Unspecified behaviour — the spec leaves a case undefined, parsers make different choices, the security layer didn't notice.
Detection signals (during code review)¶
The ruby-saml disclosure names a clear review-time tell: "REXML methods
are prefixed with REXML::, whereas Nokogiri methods are called on
document." Generalisable heuristics:
- Two imports of different parser libraries for the same format in the same module.
- Two independent extractions of what should be "the same element" — each call returns its parser's view of the input.
- A security decision that combines outputs from different parsers (e.g. signature from parser A, hashed bytes from parser B).
- Asymmetric error handling — one parser raises, the other
silently annotates errors on a member (Nokogiri
doc.errors).
Structural mitigation¶
The patterns/single-parser-for-security-boundaries pattern: one parser from authenticated bytes to enforced decision, no re-querying. Once a range of bytes has been cryptographically verified, subsequent pieces of that decision must be extracted from that same byte range by the same parser — not recovered by re-parsing the document.
Weaker mitigations exist and sometimes are all you can ship:
- Strict mode + error checks on every parser — closes some differentials (silent-error classes) but not the general differential class; see ruby-saml: Nokogiri strict mode stops one exploit but leaves the underlying class unfixed.
- Fuzzer-driven differential testing — feed the same input through both parsers in CI and compare; useful for regression prevention but not a complete defence.
Seen in¶
- sources/2025-03-15-github-sign-in-as-anyone-bypassing-saml-sso-authentication-with-parser-differentials —
Canonical case. ruby-saml uses systems/rexml to locate the
<Signature>element and systems/nokogiri to canonicalise<SignedInfo>for signature verification. Attacker crafts input where REXML and Nokogiri each return a different<Signature>, one of which is valid against a captured IdP signature and the other of which contains a<DigestValue>for an attacker-fabricated assertion. Signature check passes (against parser A's view); digest check passes (against parser B's view); authentication bypassed despite both checks passing. Two independent researchers each found exploitable differentials within days (one via Mattermost-style XML roundtrips, one via ruzzy fuzzing).
Related¶
- concepts/xml-signature-wrapping — the attack family enabled by XML-parser differentials inside XML-DSig verification.
- concepts/saml-authentication-bypass — the outcome category in SAML SSO.
- patterns/single-parser-for-security-boundaries — the structural fix.