Sign in as anyone: Bypassing SAML SSO authentication with parser differentials¶

GITHUB 2025-03-15 Tier 2

Summary¶

Two GitHub Security Lab researchers (Peter Stöckli + an external bug-bounty participant, ahacker1) independently discover an authentication-bypass class in the ruby-saml library caused by a parser differential — ruby-saml uses two different XML parsers (REXML and Nokogiri) on the same SAML response during signature verification. By crafting an XML document that REXML and Nokogiri disagree about (each returns a different <Signature> element for the same XPath query), the hash and the signature can each be verified in isolation against pieces of two different signature elements — ruby-saml confirms a valid signature and a correct digest, but the signature covers a SignedInfo from one part of the document while the digest covers an assertion fabricated by the attacker. Anyone with a single valid SAML signature from the targeted organisation (obtained from any signed assertion, or in some cases publicly published identity-provider metadata) can then forge assertions for any user — full account-takeover. Two CVEs issued: CVE-2025-25291 and CVE-2025-25292, fixed in ruby-saml 1.18.0 (2025-03-12). Post was written by the GitHub Security Lab researcher who discovered one of the two exploit paths while GitHub was independently evaluating whether to re-adopt ruby-saml (GitHub had switched to a homegrown SAML implementation in 2014). An exploitable instance was also confirmed in GitLab and notified via their security team before disclosure. Architectural lesson restated three times in the post: "relying on two different parsers in a security context can be tricky and error-prone" — the structural fix is to use one parser end-to-end over the authenticated bytes, not to keep two parsers in sync.

Key takeaways¶

Parser differentials are a general vulnerability class, not an XML-specific quirk. Parser differentials occur "when different parsers interpret the same input in different ways" — the post points to prior XML cases (Mattermost's Juho Forsén 2021 XML-roundtrip research) but explicitly names URL-parsing differentials behind SSRF and header-parsing differentials behind HTTP request-smuggling as the same structural shape. The LangSec survey paper "A Survey of Parser Differential Anti-Patterns" by Ali and Smith is cited as the canonical taxonomy. Any system that parses the same bytes with two different implementations inside a security boundary is a candidate.
Four-stage vulnerability-exploitation flow. The disclosure walks through a general template that applies to any parser-differential security bug: (1) discover two different parsers being used on the same input during code review (trivially visible here — REXML::XPath prefix vs document.at_xpath idiom); (2) establish whether a differential could be exploited given how the parser outputs are combined (the hard part — in ruby-saml, the signature is extracted with REXML but the signed bytes are canonicalised with Nokogiri; these are two asymmetric bytes that don't have to be connected); (3) find an actual input that the two parsers disagree about; (4) leverage the differential into a concrete exploit. All four stages are required — the researchers note that a differential exists in most real parser pairs, but it only becomes a bug when a security decision depends on the two parsers agreeing.
The bug is a signature-wrapping attack enabled by a disconnect between what's hashed and what's signed. SAML's <ds:Signature> element has two verification paths that a correct implementation must link: (a) the <SignatureValue> is verified against the canonicalised <SignedInfo> using the public key (this proves the <SignedInfo> bytes came from the IdP), and (b) the <DigestValue> inside <SignedInfo> must match the canonicalised hash of the <Assertion> referenced by URI=#.... ruby-saml did both, but extracted the pieces with different parsers — the <SignedInfo> + <SignatureValue> check used a valid signature visible to both parsers, while the <DigestValue> (read by REXML) corresponded to a different signature embedded in a <StatusDetail> element visible only to Nokogiri. The attacker's fabricated assertion's digest matched this second <DigestValue>; no signature ever covered the bytes being compared. Canonical structural fix (post quote): "If the library had used the content of the already extracted SignedInfo to obtain the digest value, it would have been secure in this case even with two XML parsers in use." I.e. after you have a verified <SignedInfo>, all subsequent extractions must come from that exact byte range — not re-queried from the document.
Two independent working exploits were produced within days. ahacker1 built one using the XML-roundtrip differentials technique inherited from Forsén 2021. The GitHub Security Lab author built a different exploit using a differential surfaced by ruzzy, Trail of Bits' coverage-guided Ruby fuzzer. The fact that two different parser-differentials existed and could each be turned into a bypass (within days, by two different researchers, using two different techniques) is itself a structural signal — the disconnect was the bug, not any specific differential, and patching a single differential would not close the class.
"Check for Nokogiri parsing errors" blocks one exploit but not the class. The authors note Nokogiri does return errors on malformed input, but silently — the errors live on doc.errors and don't raise exceptions. Strict-mode parsing (Nokogiri::XML::ParseOptions::STRICT | NONET) plus raise if doc.errors.any? stops at least one of the known exploit paths. This is an important mitigation under-the-gun but is explicitly named as not-a-structural-fix: "checking for Nokogiri errors could not have prevented the parser differential, but could have stopped at least one practical exploitation of it." Exploitability is differential-specific; presence-of-differential is the class.
The ruby-saml 1.18.0 fix is gadget-specific, not architectural. Rather than removing one of the XML parsers (which would break backwards-compatible API callers), the fix patches the specific disconnect by re-using the already-extracted <SignedInfo> bytes as the source for the digest comparison — closing the specific gap the two known exploits used. Post explicitly flags that removal of one parser "was already planned for other reasons" and "will likely come as part of a major release in combination with additional improvements to strengthen the library." The pattern matches the 2024-07-22 Kafka-UI RCE disclosure (fix bumped Commons Collections version and banned JndiLoginModule but did not remove deserialisation of untrusted data) — gadget-specific fixes in libraries where the structural fix would break callers.
Detection is ~impossible; IOC disclosure was deliberately withheld. "We are not aware of any reliable indicators of compromise. While we've found a potential indicator of compromise, it only works in debug-like environments and to publish it, we would have to reveal too many details about how to implement a working exploit." Best available defence is behavioural: "look for suspicious logins via SAML on the service provider side from IP addresses that do not align with the user's expected location." This is an unusually explicit public acknowledgement that the class of bug evades log-based detection — the service-provider side never sees anything it would flag, because from its POV both the signature and the digest verified.
Secondary lesson: disregard the SAML spec when implementing. Post endorses an external argument (ssoready.com) that implementing SAML by strictly following the specs produces insecure code — the specs describe behaviour at a level of indirection (XML documents referenced by URI, canonicalisation transforms on fragments, digests of pre-canonicalisation bytes, signatures over canonicalised pieces) that naturally encourages the kind of "extract then re-extract" implementation pattern that was ruby-saml's downfall. "What you actually want is a direct connection between the hashed content, the hash, and the signature." Generalised: when a spec requires independently verifying N pieces of the same bytes, a secure implementation must pin all N verifications to the same byte range — not re-parse the document to recover those pieces.

Systems / concepts / patterns extracted¶

Systems¶

systems/ruby-saml — the vulnerable library (CVE-2025-25291, CVE-2025-25292, fixed 1.18.0).
systems/rexml — pure-Ruby XML parser; ruby-saml's primary parser.
systems/nokogiri — Ruby wrapper over libxml2 / libgumbo / Xerces; added to ruby-saml for canonicalisation support REXML didn't have.
systems/saml-protocol — the SAML 2.0 authentication spec and its <ds:Signature> / <DigestValue> / <SignedInfo> verification structure.
systems/gitlab — downstream consumer of ruby-saml (via omniauth-saml); confirmed exploitable, notified pre-disclosure.
systems/github — publisher; ruby-saml was being re-evaluated for adoption, triggering the bug-bounty engagement.

Concepts¶

concepts/parser-differential — different parsers reading the same input differently; the vulnerability class.
concepts/xml-signature-wrapping — XML-DSig attack family where signed bytes and interpreted bytes diverge; parser-differential is one mechanism among several.
concepts/canonicalization-xml — normalising an XML fragment to a byte-exact form before hashing/signing; required by XML-DSig because XML has multiple byte-level representations of the same logical tree.
concepts/saml-authentication-bypass — the outcome: attacker possessing a single valid SAML signature for an organisation can impersonate any user.

Patterns¶

patterns/single-parser-for-security-boundaries — architectural fix: when implementing a cryptographic verification over structured input, use one parser end-to-end and pin every extraction to the authenticated byte range — do not re-query the document with a different parser for subsequent pieces.

Operational numbers¶

312 HN points at disclosure.
ruby-saml versions affected: up to and including 1.17.0.
Fix version: ruby-saml 1.18.0, released 2025-03-12.
CVEs: CVE-2025-25291, CVE-2025-25292.
Disclosure timeline: 2024-11-04 initial bug-bounty report (ahacker1) → 2024-11-12 second bypass found (Peter) → 2024-11-13 maintainer Sixto Martín contacted → 2024-11-14 reports filed → 2025-02-12 90-day GitHub Security Lab deadline → 2025-02-17 GitLab coordination → 2025-03-12 fix released.
Exploit precondition: attacker must possess one valid signature produced by the IdP's key. Sources: any signed assertion from any (unprivileged) user, or in some cases signed IdP metadata (publicly published).
Prior ruby-saml authentication bypass: CVE-2024-45409 (October 2024, ahacker1) — different bug, same "multi-signature SAML response" category, motivating the check-for-unique-Reference-ID guard that was in place when the two 2025 bypasses were found.
GitHub historical context: used ruby-saml until 2014, then switched to an in-house SAML implementation; was evaluating re-adoption in late 2024.
Related prior GitHub own-impl bug: CVE-2024-9487 (encrypted-assertion issue in GitHub's custom SAML).

Caveats / undisclosed¶

Exploit payloads: the specific XML roundtrip / fuzzer-found differentials that produced working bypasses are not published. Planned for later disclosure to the GitHub Security Lab repository.
Fuzzer details: ruzzy (Trail of Bits' Ruby fuzzer) named but not walked through for SAML-specific corpus generation.
GitLab impact scope (which versions, which configurations, whether multi-tenant SaaS was reachable) not disclosed in this post.
Full enumeration of other downstream consumers of ruby-saml / omniauth-saml beyond GitLab not attempted — the post explicitly directs users to audit.
Long-tail parser-differential inventory — how many other parser-differentials exist between REXML and Nokogiri beyond the two exploited here — not explored. Two independent techniques (XML roundtrips, coverage-guided fuzzing) each found exploitable ones quickly, suggesting the supply is not small.
No IOC details intentionally withheld (post explicitly discusses the decision).

Relationship to the wider wiki¶

This source sits at an unusual point for this wiki: it's a security-research / CVE-disclosure post on a library vulnerability, not a production-systems retrospective. Prior GitHub Security Lab posts in this wiki's raw corpus (2024-07-22 Kafka-UI RCE, 2024-08-13 V8 RCE, 2024-12-18 GStreamer fuzzing) were skipped as out-of-scope for a distributed-systems-design wiki. This one is ingested because the architectural lesson — do not split a cryptographic verification across two parsers — is a reusable structural principle for security-boundary design that naturally abstracts above the specific XML / SAML instantiation. The companion concepts/parser-differential concept then carries naturally into other parts of the wiki's scope: URL-parsing differentials in SSRF, HTTP-header differentials in request smuggling, and more broadly any place where the same bytes cross two independent parsers inside a security-critical decision. SAML / SSO / XML-DSig pull the wiki into an identity-and-auth neighbourhood that currently only extends as far as systems/okta / systems/amazon-cognito / systems/amazon-verified-permissions / systems/cedar (runtime authorisation, not authentication protocols); this source anchors the SAML side. GitHub as a company had no company page before this ingest; this creates companies/github as a Tier-2 anchor.

Raw source¶

raw/github/2025-03-15-sign-in-as-anyone-bypassing-saml-sso-authentication-with-par-31e11f03.md

Source¶

Original: https://github.blog/security/sign-in-as-anyone-bypassing-saml-sso-authentication-with-parser-differentials/