CONCEPT Cited by 2 sources

Parser differential¶

A parser differential is a security-relevant disagreement between two implementations of the same parsing contract: given the same input bytes, parser A and parser B produce different parse trees, different extracted values, or different semantic decisions. When a security-critical decision depends on more than one parser agreeing — explicitly or implicitly — the differential becomes exploitable.

The concept is broader than one file format. The canonical survey is A Survey of Parser Differential Anti-Patterns by Ali and Smith (LangSec 2023), which catalogues real-world parser differentials by root cause.

Canonical vulnerability families¶

Surface	Attack enabled	How the differential is structured
XML	XML signature wrapping, SAML / XML-DSig bypass	Two XML parsers disagree on which `<Signature>` element an XPath resolves to
URL parsing	Server-Side Request Forgery (SSRF)	Validator parses a URL one way (host = `trusted.com`); HTTP client parses it another way (host = `attacker.com`)
HTTP framing	Request smuggling	Front-end proxy uses `Content-Length`; back-end uses `Transfer-Encoding` (or vice versa)
File format	Polyglot uploads, AV evasion	Antivirus parses as one format; consumer parses as another
JSON	Auth bypass when server validates one JSON parse but acts on another	Duplicate keys, number/string coercion, UTF-8 normalisation

In every case the structural shape is identical: two parsers, same bytes, different answers, security decision straddles the gap.

Why it happens¶

Protocol complexity — XML, URLs, and HTTP have legacy / ambiguous grammars with implementation leeway.
Language mismatch — ruby-saml added systems/nokogiri for canonicalisation that its original systems/rexml didn't support, without removing REXML; now both run over the same input.
Operational contracts diverge — one parser might be hardened / strict while the other accepts malformed input silently, so some inputs parse in one and not the other.
Unspecified behaviour — the spec leaves a case undefined, parsers make different choices, the security layer didn't notice.

Detection signals (during code review)¶

The ruby-saml disclosure names a clear review-time tell: "REXML methods are prefixed with REXML::, whereas Nokogiri methods are called on document." Generalisable heuristics:

Two imports of different parser libraries for the same format in the same module.
Two independent extractions of what should be "the same element" — each call returns its parser's view of the input.
A security decision that combines outputs from different parsers (e.g. signature from parser A, hashed bytes from parser B).
Asymmetric error handling — one parser raises, the other silently annotates errors on a member (Nokogiri doc.errors).

Structural mitigation¶

The patterns/single-parser-for-security-boundaries pattern: one parser from authenticated bytes to enforced decision, no re-querying. Once a range of bytes has been cryptographically verified, subsequent pieces of that decision must be extracted from that same byte range by the same parser — not recovered by re-parsing the document.

Weaker mitigations exist and sometimes are all you can ship:

Strict mode + error checks on every parser — closes some differentials (silent-error classes) but not the general differential class; see ruby-saml: Nokogiri strict mode stops one exploit but leaves the underlying class unfixed.
Fuzzer-driven differential testing — feed the same input through both parsers in CI and compare; useful for regression prevention but not a complete defence.

Seen in¶

sources/2025-03-15-github-sign-in-as-anyone-bypassing-saml-sso-authentication-with-parser-differentials — Canonical case. ruby-saml uses systems/rexml to locate the <Signature> element and systems/nokogiri to canonicalise <SignedInfo> for signature verification. Attacker crafts input where REXML and Nokogiri each return a different <Signature>, one of which is valid against a captured IdP signature and the other of which contains a <DigestValue> for an attacker-fabricated assertion. Signature check passes (against parser A's view); digest check passes (against parser B's view); authentication bypassed despite both checks passing. Two independent researchers each found exploitable differentials within days (one via Mattermost-style XML roundtrips, one via ruzzy fuzzing).
sources/2026-01-28-meta-rust-at-scale-an-added-layer-of-security-for-whatsapp — Media-file / OS-library variant of the attack class, plus the canonical app-layer defensive posture. WhatsApp's wamedia library (now Rust) parses MP4s before handing them off to OS-provided media parsers, which may be unpatched (concepts/os-library-vulnerability-ungovernable). The attack shape: attacker crafts an MP4 that wamedia accepts but that triggers a bug in a downstream OS media library — "detect files which do not adhere to the MP4 standard and might trigger bugs in a vulnerable OS library on the receiver side." Meta's mitigation is the patterns/format-aware-malware-check-before-os-handoff pattern plus the risk-indicator / spoof / dangerous-type check ensemble ( Kaleidoscope) — block the divergent input at the app layer before it reaches the ungovernable parser. Reinforces that parser-differential defense has two distinct postures: the ruby-saml case uses "one parser for security boundaries"; the WhatsApp case uses "one parser in front of many ungovernable parsers, reject divergent inputs".

concepts/xml-signature-wrapping — the attack family enabled by XML-parser differentials inside XML-DSig verification.
concepts/saml-authentication-bypass — the outcome category in SAML SSO.
patterns/single-parser-for-security-boundaries — the structural fix.
concepts/format-conformance-check — the WhatsApp-case mitigation primitive (validate input at the app layer before OS handoff).
concepts/os-library-vulnerability-ungovernable — the downstream- parser regime that motivates the WhatsApp-case posture.
patterns/format-aware-malware-check-before-os-handoff — the full app-layer pattern around the parser-differential defense.