SYSTEM Cited by 2 sources
Aurora DSQL¶
What it is¶
Aurora DSQL is AWS's serverless distributed SQL database, announced at re:Invent 2024. It targets what prior generations of Aurora did not: a relational database that scales out writes horizontally with no infra management, automatic scaling, zero operational overhead, and multi-region support — while staying wire-compatible with PostgreSQL. It is positioned as the successor step beyond Aurora (cloud-optimized storage) and Aurora Serverless (automated vertical scaling).
DSQL is a Postgres-extension product: it reuses Postgres for query processing (parser, planner) and replaces replication, concurrency control, durability, storage, and transaction-session management with its own components. (Source: sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey.)
Architecture¶
DSQL is decomposed into "bite-sized chunks with clear interfaces and explicit contracts" — Unix-philosophy modules that jointly provide ACID. Key named components:
- Journal — append-only ordered log. DSQL writes the entire commit into a single journal, regardless of how many rows the transaction touches. This makes atomicity + durability trivial and eliminates the need for two-phase commit across journals.
- Adjudicator — sits in front of the journal; ensures only one transaction wins in the event of conflicts. Was the pilot component for the Rust migration (see patterns/pilot-component-language-migration).
- Crossbar — the subscription router that bridges journals and storage. Storage nodes subscribe to key ranges; Crossbar follows every journal (each already ordered by transaction time) and assembles the global total order, delivering updates to subscribed storage nodes. Separates write-path scaling from read-path scaling.
- Storage — subscribes to Crossbar for ranges it owns.
- Query processor — stock Postgres parser/planner, accessed via extensions.
- Control plane — drives hands-free operations: hot-cluster detection, topology changes, scaling decisions. Originally Kotlin, now all Rust.
Why not two-phase commit¶
Conventional scale-out for writes shards rows across journals and uses 2PC for multi-shard transactions. DSQL considered and rejected this:
- Happy path fine.
- Reality: timeouts, liveness, rollbacks, coordinator-failure handling make operational complexity compound.
- For a system aiming at "multi-region, no ops overhead" the 2PC coordinator shape was the wrong foundation.
Trade-off taken: write the entire transaction into one journal. Writes become trivial. Reads get harder.
Why the Crossbar exists¶
With the single-journal-per-commit model:
- A read for key
know has to consider every journal — any one of them may hold the latest write fork. - Storage would need to connect to every journal directly → every new journal adds fanout to every storage node, which hits network-bandwidth limits as the system scales.
Crossbar solves this by introducing an indirection: storage subscribes to key ranges, and Crossbar (not storage) follows every journal to compose the global total order, forwarding to subscribers.
This creates a new problem: every transaction now depends on a composition over every journal. That made DSQL's architecture a textbook instance of concepts/tail-latency-at-scale: any single host stalling for a GC pause of 1s can block the composition. In a 40-host simulation, modeling occasional 1-second stalls, the system hit ~6,000 TPS vs. an expected ~1,000,000 TPS, and p-tail latency went from 1s → 10s.
This is what ultimately forced the language choice.
Engineering history: JVM → Rust¶
DSQL started 100% JVM (Kotlin) and ended 100% Rust. Two pivots:
Pivot 1: data plane to Rust¶
Trigger: the 40-host Crossbar simulation (above) made clear that at scale, every transaction would be affected by the worst-case latency of some host, so GC pauses were not an optimization problem but an architectural blocker.
Options considered:
- Deep JVM / GC tuning: known path, bounded ceiling.
- C / C++: performance and control, but loses memory safety.
- Rust: predictable performance without GC, memory safety without giving up control, zero-cost abstractions.
The team picked Rust and validated via a pilot on the Adjudicator — the smallest, simplest, most isolated component, chosen because there was already a Rust client for the journal and a Kotlin version of the Adjudicator existed as a baseline. Result: 30,000 TPS from a first-cut rewrite by Java developers vs. 2,000–3,000 TPS from years of Kotlin tuning. Roughly 10×, with no perf work. After this the team "stopped asking 'should we use Rust' and started asking 'where else?'"
Pivot 2: control plane to Rust¶
Original split: Kotlin control plane, Rust data plane — seemed like best-of-both-worlds (GC-ok, productive, existing internal libs vs. latency-sensitive hot path).
Problem surfaced in integration: control plane shares non-trivial logic with the data plane (hot-cluster detection, topology changes). Two languages meant:
- No shared library → drift between Kotlin and Rust logic.
- No shared simulation tooling → can't co-test.
- Each misunderstanding became a debug-fix-deploy loop.
Resolution: rewrite the control plane in Rust. By this point (roughly a year later), Rust 2021 edition had landed, internal Rust library support had expanded (AWS Authentication Runtime's Rust client was outperforming Java), and API Gateway + Lambda had absorbed many integration concerns. Team wanted to move — "when can we start?" not "do we have to?"
End state: full-system Rust, p99 tracks p50 — remarkably consistent tail latency. Internal ops web UI: also Rust, via WebAssembly. (Source: sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey.)
Why Postgres via extensions, not a fork¶
Postgres is 1986-vintage, >1M lines of C, thousands of contributors, still active. Forking it would mean either (a) constant rebase pain, or (b) drift into a maintenance nightmare. Instead DSQL uses Postgres's public extension API — its extensions run in the Postgres process but live in separate files/packages, benefiting from upstream improvements without the fork cost.
Extensions themselves are written in Rust (not C), because even with a C/Postgres API surface, the memory-safety win is about new code, not existing Postgres code — see concepts/memory-safety.
Team & onboarding¶
- Two engineers with no C / C++ / Rust experience wrote the pilot Adjudicator.
- Heavy investment in ramp: "The DSQL Book" internal guide by Marc Brooker, weekly learning sessions, paper reviews, architectural deep-dives, Niko Matsakis (Rust language designer) brought in for thorny problems before code was written.
- Productivity penalty vs. Java: the team expected one, didn't see one post-ramp.
- Thanked contributors: Marc Brooker, Marc Bowes, Niko Matsakis, James Morle, Mike Hershey, Zak van der Merwe, Gourav Roy, Matthys Strydom.
Seen in¶
- sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey — architecture retrospective and Rust journey (this source page).
- sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas
— AWS's 2025-05-03 response to Jepsen's RDS-Postgres Multi-AZ
analysis names DSQL (alongside systems/aurora-limitless) as
one of two AWS offerings that does not exhibit the
Long Fork anomaly community
Postgres + RDS Multi-AZ both exhibit. Mechanism: DSQL replaces
Postgres's
ProcArray-based visibility (where visibility order can diverge from WAL commit order — see concepts/visibility-order-vs-commit-order) with time-based MVCC. Consistent snapshots across the shards of a distributed Postgres become a clock read, not a fanned-out gather of per-nodeProcArraycontents (which AWS calls "practically infeasible"). This is the concurrency-control substrate in DSQL's "extend, don't fork" strategy — replace the visibility model, keep the parser/planner.
Related¶
- systems/postgresql — query processing layer DSQL extends, not forks.
- systems/aurora-limitless — sibling Aurora horizontally-scaled Postgres offering; also uses time-based MVCC to avoid Long Fork.
- systems/aws-rds — the Multi-AZ Postgres offering that does exhibit Long Fork.
- systems/firecracker — adjacent AWS primitive (not specifically invoked for DSQL in the source, but in the same serverless-infra family).
- concepts/tail-latency-at-scale — the architectural forcing function for the Rust move.
- concepts/memory-safety — rationale for Rust over C in Postgres extensions.
- concepts/snapshot-isolation, concepts/long-fork-anomaly, concepts/visibility-order-vs-commit-order — the consistency-model cluster DSQL's time-based MVCC sidesteps.
- concepts/control-plane-data-plane-separation — DSQL is a case where the split got revisited and unified.
- patterns/pilot-component-language-migration — the Adjudicator-first strategy.
- patterns/postgres-extension-over-fork — extend Postgres via public API, don't fork.