SYSTEM Cited by 3 sources

Aurora DSQL¶

What it is¶

Aurora DSQL is AWS's serverless distributed SQL database, announced at re:Invent 2024. It targets what prior generations of Aurora did not: a relational database that scales out writes horizontally with no infra management, automatic scaling, zero operational overhead, and multi-region support — while staying wire-compatible with PostgreSQL. It is positioned as the successor step beyond Aurora (cloud-optimized storage) and Aurora Serverless (automated vertical scaling).

DSQL is a Postgres-extension product: it reuses Postgres for query processing (parser, planner) and replaces replication, concurrency control, durability, storage, and transaction-session management with its own components. (Source: sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey.)

Architecture¶

DSQL is decomposed into "bite-sized chunks with clear interfaces and explicit contracts" — Unix-philosophy modules that jointly provide ACID. Key named components:

Journal — append-only ordered log. DSQL writes the entire commit into a single journal, regardless of how many rows the transaction touches. This makes atomicity + durability trivial and eliminates the need for two-phase commit across journals.
Adjudicator — sits in front of the journal; ensures only one transaction wins in the event of conflicts. Was the pilot component for the Rust migration (see patterns/pilot-component-language-migration).
Crossbar — the subscription router that bridges journals and storage. Storage nodes subscribe to key ranges; Crossbar follows every journal (each already ordered by transaction time) and assembles the global total order, delivering updates to subscribed storage nodes. Separates write-path scaling from read-path scaling.
Storage — subscribes to Crossbar for ranges it owns.
Query processor — stock Postgres parser/planner, accessed via extensions.
Control plane — drives hands-free operations: hot-cluster detection, topology changes, scaling decisions. Originally Kotlin, now all Rust.

Why not two-phase commit¶

Conventional scale-out for writes shards rows across journals and uses 2PC for multi-shard transactions. DSQL considered and rejected this:

Happy path fine.
Reality: timeouts, liveness, rollbacks, coordinator-failure handling make operational complexity compound.
For a system aiming at "multi-region, no ops overhead" the 2PC coordinator shape was the wrong foundation.

Trade-off taken: write the entire transaction into one journal. Writes become trivial. Reads get harder.

Why the Crossbar exists¶

With the single-journal-per-commit model:

A read for key k now has to consider every journal — any one of them may hold the latest write for k.
Storage would need to connect to every journal directly → every new journal adds fanout to every storage node, which hits network-bandwidth limits as the system scales.

Crossbar solves this by introducing an indirection: storage subscribes to key ranges, and Crossbar (not storage) follows every journal to compose the global total order, forwarding to subscribers.

This creates a new problem: every transaction now depends on a composition over every journal. That made DSQL's architecture a textbook instance of concepts/tail-latency-at-scale: any single host stalling for a GC pause of 1s can block the composition. In a 40-host simulation, modeling occasional 1-second stalls, the system hit ~6,000 TPS vs. an expected ~1,000,000 TPS, and p-tail latency went from 1s → 10s.

This is what ultimately forced the language choice.

Engineering history: JVM → Rust¶

DSQL started 100% JVM (Kotlin) and ended 100% Rust. Two pivots:

Pivot 1: data plane to Rust¶

Trigger: the 40-host Crossbar simulation (above) made clear that at scale, every transaction would be affected by the worst-case latency of some host, so GC pauses were not an optimization problem but an architectural blocker.

Options considered:

Deep JVM / GC tuning: known path, bounded ceiling.
C / C++: performance and control, but loses memory safety.
Rust: predictable performance without GC, memory safety without giving up control, zero-cost abstractions.

The team picked Rust and validated via a pilot on the Adjudicator — the smallest, simplest, most isolated component, chosen because there was already a Rust client for the journal and a Kotlin version of the Adjudicator existed as a baseline. Result: 30,000 TPS from a first-cut rewrite by Java developers vs. 2,000–3,000 TPS from years of Kotlin tuning. Roughly 10×, with no perf work. After this the team "stopped asking 'should we use Rust' and started asking 'where else?'"

Pivot 2: control plane to Rust¶

Original split: Kotlin control plane, Rust data plane — seemed like best-of-both-worlds (GC-ok, productive, existing internal libs vs. latency-sensitive hot path).

Problem surfaced in integration: control plane shares non-trivial logic with the data plane (hot-cluster detection, topology changes). Two languages meant:

No shared library → drift between Kotlin and Rust logic.
No shared simulation tooling → can't co-test.
Each misunderstanding became a debug-fix-deploy loop.

Resolution: rewrite the control plane in Rust. By this point (roughly a year later), Rust 2021 edition had landed, internal Rust library support had expanded (AWS Authentication Runtime's Rust client was outperforming Java), and API Gateway + Lambda had absorbed many integration concerns. Team wanted to move — "when can we start?" not "do we have to?"

End state: full-system Rust, p99 tracks p50 — remarkably consistent tail latency. Internal ops web UI: also Rust, via WebAssembly. (Source: sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey.)

Why Postgres via extensions, not a fork¶

Postgres is 1986-vintage, >1M lines of C, thousands of contributors, still active. Forking it would mean either (a) constant rebase pain, or (b) drift into a maintenance nightmare. Instead DSQL uses Postgres's public extension API — its extensions run in the Postgres process but live in separate files/packages, benefiting from upstream improvements without the fork cost.

Extensions themselves are written in Rust (not C), because even with a C/Postgres API surface, the memory-safety win is about new code, not existing Postgres code — see concepts/memory-safety.

Team & onboarding¶

Two engineers with no C / C++ / Rust experience wrote the pilot Adjudicator.
Heavy investment in ramp: "The DSQL Book" internal guide by Marc Brooker, weekly learning sessions, paper reviews, architectural deep-dives, Niko Matsakis (Rust language designer) brought in for thorny problems before code was written.
Productivity penalty vs. Java: the team expected one, didn't see one post-ramp.
Thanked contributors: Marc Brooker, Marc Bowes, Niko Matsakis, James Morle, Mike Hershey, Zak van der Merwe, Gourav Roy, Matthys Strydom.

Seen in¶

sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey — architecture retrospective and Rust journey (this source page).
sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas — AWS's 2025-05-03 response to Jepsen's RDS-Postgres Multi-AZ analysis names DSQL (alongside systems/aurora-limitless) as one of two AWS offerings that does not exhibit the Long Fork anomaly community Postgres + RDS Multi-AZ both exhibit. Mechanism: DSQL replaces Postgres's ProcArray-based visibility (where visibility order can diverge from WAL commit order — see concepts/visibility-order-vs-commit-order) with time-based MVCC. Consistent snapshots across the shards of a distributed Postgres become a clock read, not a fanned-out gather of per-node ProcArray contents (which AWS calls "practically infeasible"). This is the concurrency-control substrate in DSQL's "extend, don't fork" strategy — replace the visibility model, keep the parser/planner.
sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network — first wiki disclosure that DSQL consumes AWS Lambda's networking substrate as a managed internal service, not as a copy of the stack. When the DSQL team needed "scalable Firecracker-based networking with the right security and performance characteristics," the Lambda networking team encapsulated the full networking stack into a service DSQL installs and runs on its own workers — device management, firewall rules, NAT translation, and security hygiene for network-slot reuse after release. DSQL's contract: request a network when it needs one for a VM, release it when done; Lambda owns the service and vends new versions, and "every optimization they make flows to DSQL automatically." Disclosed outcome: "saved the DSQL team months of engineering effort and gave them Lambda-grade networking density from day one." Canonical wiki instance of patterns/encapsulate-optimization-as-internal-service — the shape that lets team A's years-long optimization compound for team B without team B paying integration cost per upgrade. Extends the prior wiki "adjacent AWS primitive, not specifically invoked" note on DSQL + Firecracker to "DSQL consumes Firecracker-adjacent networking from Lambda explicitly."

systems/postgresql — query processing layer DSQL extends, not forks.
systems/aurora-limitless — sibling Aurora horizontally-scaled Postgres offering; also uses time-based MVCC to avoid Long Fork.
systems/aws-rds — the Multi-AZ Postgres offering that does exhibit Long Fork.
systems/firecracker — adjacent AWS primitive (not specifically invoked for DSQL in the source, but in the same serverless-infra family).
concepts/tail-latency-at-scale — the architectural forcing function for the Rust move.
concepts/memory-safety — rationale for Rust over C in Postgres extensions.
concepts/snapshot-isolation, concepts/long-fork-anomaly, concepts/visibility-order-vs-commit-order — the consistency-model cluster DSQL's time-based MVCC sidesteps.
concepts/control-plane-data-plane-separation — DSQL is a case where the split got revisited and unified.
patterns/pilot-component-language-migration — the Adjudicator-first strategy.
patterns/postgres-extension-over-fork — extend Postgres via public API, don't fork.