Unlocking the Cloudflare app ecosystem with OAuth for all¶
Summary¶
Cloudflare opened self-managed OAuth to all customers, requiring a major upgrade of its underlying OAuth engine — Ory Hydra — from 1.x to 2.x. The upgrade involved two sequential major-version jumps, custom schema migrations to avoid table locks, a blue-green database strategy with revocation-replay via Cloudflare Queues, refresh token coalescing in a Worker proxy, and careful multi-system cutover coordination. Post-upgrade metrics showed dramatic improvements: API P95 latency dropped 45%, memory usage fell 14-40%, and CPU usage decreased 37%.
Key Takeaways¶
-
Staged version upgrades reduce risk: Rather than jumping directly from old 1.x to 2.x, Cloudflare performed two sequential upgrades (latest 1.x first, then 2.x), validating each transition independently.
-
Concurrent index creation avoids locking: The stock Hydra SQL migrations would have taken exclusive locks on critical tables. Cloudflare rewrote them to use
CREATE INDEX CONCURRENTLY, preventing user-facing impact during the 1.x upgrade. -
Custom SDK to avoid SELECT * deserialization issues: The original Hydra version performed
SELECT *queries; after schema changes this caused deserialization failures. Cloudflare built a custom Hydra binary that selects explicit columns. -
Blue-green database migration for zero-downtime major upgrades: A full 2.x in-place upgrade was infeasible due to massive schema changes. Cloudflare copied the production database, ran migrations on the copy (~3 hours), then cut over simultaneously with the new Hydra service and two related system configs.
-
Token expiry extension as write-reduction lever: Before the upgrade window, token expiry was increased to multiple hours so apps wouldn't need to refresh during the migration, reducing database writes that would be lost in the cutover.
-
Revocation-replay queue preserves safety invariants: Revocations performed during the migration window were written to a Cloudflare Queues queue, then drained and replayed against the new database. This ensured no revoked app inadvertently retained access.
-
Refresh token coalescing prevents chain invalidation: Hydra 1.x had stricter refresh invalidation — a single reused refresh token would invalidate the entire session chain. Cloudflare added coalescing in the Worker routing layer to cache and deduplicate refresh requests. Hydra 2.x resolves this natively with a configurable grace period.
-
Corrupted migration state caused 403 cascades: A migration bug corrupted valid OAuth session state, causing disagreement between Hydra and the authorization service. The resultant 403 errors required data restoration and architectural improvements to remove reliance on static policy data.
-
Operational performance gains were significant: Post-upgrade metrics: ~132.5M rows updated, ~114.7M inserted, 136.97 GB temp bytes, 22.2k transaction commits during migration; API P95 latency -45% (185ms→101ms), RSS memory -14%, Go heap -40%, goroutines -23%, CPU -37%.
Systems & Concepts Extracted¶
- Systems: systems/ory-hydra, systems/cloudflare-queues, systems/cloudflare-workers
- Concepts: concepts/blue-green-deployment, concepts/oauth-token-lifecycle, concepts/refresh-token-invalidation
- Patterns: patterns/blue-green-database-deployment, patterns/revocation-replay-queue, patterns/refresh-token-coalescing
Operational Numbers¶
| Metric | Value |
|---|---|
| Rows updated during migration | ~132.5M |
| Rows inserted during migration | ~114.7M |
| Temp bytes | 136.97 GB |
| Transaction commits | ~22.2k |
| API P95 latency before | 185 ms |
| API P95 latency after | 101 ms (-45%) |
| RSS memory reduction | -14% (888→763 MB) |
| Go heap alloc reduction | -40% (449→271 MB) |
| Goroutines reduction | -23% (4015→3076) |
| CPU reduction | -37% (1.07→0.67 cores) |
Source¶
- Original: https://blog.cloudflare.com/oauth-for-all/
- Raw markdown:
raw/cloudflare/2026-06-24-unlocking-the-cloudflare-app-ecosystem-with-oauth-for-all-cc476f17.md