Skip to content

PATTERN Cited by 1 source

Storage-forwarded redo-log replication

Problem

Traditional MySQL / Postgres clusters replicate by having each replica hold its own full copy of the data and tail the primary's binlog (or WAL) to stay current. This works but has three structural costs at cloud scale:

  1. Each replica consumes its own storage footprint — N× the data size for N replicas.
  2. Starting a new read-only replica is slow — must bootstrap from backup + catch up.
  3. Durability and read-scaling are coupled — both solved by the same replica pool.

Solution

Forward the redo-log stream from the writer compute node to a dedicated distributed storage fabric, and let all read-only compute nodes in the cluster share that storage substrate directly.

This is the substrate Amazon Aurora uses internally. Brian Morrison II's canonical framing (PlanetScale, 2024-01-24):

"Instead of storing the redo log entries directly on the attached volumes, they are forwarded to dedicated Aurora storage appliances in the same availability zone as the source compute node. Data on this appliance is stored within 10 GiB segments spread across three availability zones in a given region. Before the compute node responds to the application, Aurora will ensure that at least four of the six default segments have a replicated copy of the data to ensure durability should a data center be taken offline."

"Since data is replicated on the storage level, read-only compute nodes can be started at any time in an availability zone containing a copy of the data for that node to read. For any pages that have been read to memory, the source node will directly notify any read-only nodes of updates. This causes the read-only nodes to accommodate the changed data. As a result, the risk of reading stale data is reduced, however, replication lag still needs to be considered." (Source: sources/2026-04-21-planetscale-planetscale-vs-amazon-aurora-replication)

Mechanics

  1. Writer compute node runs the SQL engine (MySQL- or Postgres-compatible), commits transactions, and forwards each redo-log entry to the storage fabric.
  2. Storage fabric is a separate service (not EBS — see systems/aws-ebs) that owns the data. Each segment is 10 GiB. Each segment has 6 copies spread across 3 AZs — see concepts/aurora-storage-quorum.
  3. Writer ack's the client when ≥ 4 of 6 segment copies have persisted the redo entry.
  4. Read-only compute nodes connect directly to the storage segments — they don't replay a binlog; they read the same pages the writer writes.
  5. Cache-coherence via writer-initiated page-update notifications: "the source node will directly notify any read-only nodes of updates" — the writer pushes cache invalidations out-of-band so reader buffer caches stay reasonably current.
  6. Storage-tier auto-expansion: "Aurora's storage appliance will automatically allocate new storage segments as needed." New segments slot into the 6-copy 3-AZ scheme.

Consequences

  • Fast read-replica startup — a new read-only compute node doesn't bootstrap from backup; it just starts reading the shared storage.
  • Decoupled read-scaling and durability — add read replicas without adding storage; durability is already 6×.
  • Cross-AZ durability is free — no customer configuration required; the quorum geometry handles it.
  • Replication lag is not zero — readers' buffer caches lag the writer. "replication lag still needs to be considered" (Morrison).
  • Not horizontally shardable — the scheme scales reads and durability but not writes; all writes go through one writer compute node. (Aurora Limitless on Postgres later addresses this for a subset of workloads.)
  • Rolling upgrades become harder — there are no peer replicas to upgrade independently and promote; the storage substrate is shared, so version changes must coordinate across the whole cluster. This is the root reason Aurora requires a maintenance window for version upgrades where traditional replicated MySQL clusters like PlanetScale can do them replica-by-replica (see concepts/rolling-upgrade and concepts/mysql-version-upgrade).
  • Restore-replay backup validation is harder — with no independent replica substrate, proving a backup is restorable by actually restoring it to a dedicated peer node doesn't fit naturally.
  • Proprietary, vendor-locked substrate — the storage fabric is not an open format; workload portability is limited to the Aurora-compatible wire protocol.

Contrast: traditional binlog replication

  • PlanetScale / Vitess / stock MySQL with replicas: each replica holds its own data copy, replays the primary's binlog, supports rolling upgrades and restore-replay backup validation natively. See patterns/shared-nothing-storage-topology for the canonical counter-model.
  • Aurora cross-cluster external replication uses the binlog (the post notes: "While Aurora does use the binary log for external replication, AWS has built a closed and proprietary replication system that deviates from the traditional MySQL replication configuration for replicating within an Aurora cluster") — so Aurora is both storage-forwarded internally and binlog-based externally. The two replication schemes coexist, serving different purposes.

Seen in

Last updated · 470 distilled / 1,213 read