PATTERN Cited by 1 source

Single parser for security boundaries¶

What it is¶

Inside a cryptographic verification — or any security decision that combines multiple pieces of structured input — use one parser end-to-end, and pin every subsequent extraction to the same byte range that was cryptographically authenticated. Do not re-query the document with a different parser for subsequent pieces.

The pattern is a structural response to the parser-differential vulnerability class: when two parsers return different answers for the same input, a security check that combines outputs from both can pass even when no single chain of bytes is valid end-to-end.

Why two parsers¶

Two parsers enter the same module for many reasons that are good in isolation:

Capability gap. REXML can't canonicalise XML, so ruby-saml added Nokogiri for the canonicalisation step — while keeping REXML for location / extraction work.
Language mismatch. A validation framework parses URLs one way; the HTTP client that actually makes the request parses them another (SSRF).
Operational contract. Front-end proxy parses HTTP framing lazily; back-end parses strictly — request smuggling.
Historic decision. A library accreted two parsers over its lifetime and nobody had authority to remove one.

In each case the structural property is the same: the bytes cross a security boundary twice, interpreted by two implementations, and the security check trusts both interpretations without linking them.

The rule¶

After a cryptographic verification succeeds — signature over <ds:SignedInfo>, authentication over a URL's claimed host, HMAC over a cookie — every subsequent piece of that decision must come from the same byte range that was authenticated, extracted by the same parser instance:

Don't re-parse the whole document with a different parser.
Don't extract <ds:DigestValue> with parser A after verifying <ds:SignedInfo> canonicalised by parser B.
Don't ask the HTTP client to re-parse a URL whose host was already validated.
Do operate exclusively on the byte slice the verification cryptographically bound — or on objects derived from that slice by the verifying parser.

The ruby-saml 2025 disclosure names the architectural form of the rule explicitly (and restates it as the root lesson):

"If the library had used the content of the already extracted SignedInfo to obtain the digest value, it would have been secure in this case even with two XML parsers in use."

The second-best version of the rule — "use only one parser in the module" — is the stronger structural fix but is often blocked by backward-compatibility constraints. The weaker rule — "pin extractions to the already-verified bytes" — is usually achievable without API breakage and closes the same gap.

Structure¶

Before (unsafe):

parser_A.verify_signature(doc, sig, key)      # passes on parser A's view
parser_B.hash_check(doc, digest)              # passes on parser B's view
# but the two checks ran against different parts of the document

After (safe):

signed_info_bytes = parser_A.extract_canonical_signed_info(doc)
parser_A.verify_signature(signed_info_bytes, sig, key)     # authenticates bytes

# ALL subsequent extractions come from signed_info_bytes, not from doc
digest = parser_A.extract_digest_value(signed_info_bytes)
reference_uri = parser_A.extract_reference_uri(signed_info_bytes)
assertion_bytes = parser_A.extract_referenced_element(doc, reference_uri)

# canonicalise the referenced assertion with the same parser,
# and compare against the digest from the verified SignedInfo
parser_A.hash_check(assertion_bytes, digest)

Weaker mitigations (ship these while shipping the real fix)¶

Strict-mode parsing on every parser. Nokogiri ParseOptions::STRICT | NONET + raise if doc.errors.any? closes some parser-differential exploits but not the class — presence of the two-parser seam is the bug, not any specific differential. "Checking for Nokogiri errors could not have prevented the parser differential, but could have stopped at least one practical exploitation of it."
Fuzzer-driven differential testing. Feed the same input through both parsers in CI and compare outputs; useful for regression detection but not a complete defence — fuzzers find differentials, they don't prove there aren't more.
ID-uniqueness guards. Require exactly one element with a given ID (ruby-saml added this after CVE-2024-45409). Routes around some XSW variants but not the parser-differential class — the 2025 exploit sidesteps it by placing the second signature in <StatusDetail>.

None of these are the structural fix. They buy time.

When to use¶

Any signature / MAC / JWT / SAML / XML-DSig verification path.
Any URL-parsing path where one component validates and another connects (SSRF defence).
Any HTTP-framing path where one layer parses headers and another enforces auth (request smuggling defence).
Any JSON-parsing path where one layer validates and another acts (duplicate-key confusion, number/string coercion).
Any cookie / token path that authenticates bytes and then extracts claims.

When it's hard to apply¶

Legacy callers. Ripping out a parser to consolidate to one breaks the public API. The weaker rule (pin extractions to authenticated bytes) is usually compatible.
Ecosystem parser monopolies. Some platforms force a parser choice (e.g. OS default XML libraries) at a layer the application can't replace.
Spec-level indirection. Standards like XML-DSig assume extractions are recoverable from the document by URI lookup; the safer implementation deliberately ignores the spec's affordance and pins to the authenticated byte range. The ruby-saml disclosure endorses ssoready.com's argument that secure SAML implementation requires disregarding parts of the spec.

Seen in¶

sources/2025-03-15-github-sign-in-as-anyone-bypassing-saml-sso-authentication-with-parser-differentials — canonical wiki instance. ruby-saml uses REXML to locate <ds:Signature> and Nokogiri to canonicalise <ds:SignedInfo>. The 1.18.0 fix re-uses the already-extracted <ds:SignedInfo> bytes as the source for the digest comparison — a partial application of the pattern (keeps both parsers for backward compatibility but pins subsequent extractions to authenticated bytes). The post flags full parser consolidation (PR #736) as planned for a future major release, acknowledging that the weaker rule is what shipped now and the stronger rule is the long-term target.

concepts/parser-differential — the vulnerability class this pattern structurally defuses.
concepts/xml-signature-wrapping — the XML-specific attack family the pattern prevents.
concepts/saml-authentication-bypass — the outcome the pattern blocks in SAML-SP implementations.
concepts/canonicalization-xml — the XML-DSig primitive that most often splits across two parsers.
systems/ruby-saml — the library whose disclosure named this pattern as the structural fix.
systems/saml-protocol — the spec whose extract-then-re-extract structure invites the unsafe implementation.