Skip to content

PATTERN Cited by 1 source

Block-level continuous replication

Shape

Replicate changes at the block-device layer, continuously (not on a snapshot schedule), producing a crash-consistent replica that tracks the source within seconds of each write. Recovery = launch compute against the replicated block state — which boots as if the primary had crashed and come back up.

Three properties jointly:

  1. Block-level — captures all filesystem / database state below the application layer; application-agnostic, workload-agnostic.
  2. Continuous — replication runs constantly, not on a snapshot cadence; RPO is seconds, not minutes/hours.
  3. Crash-consistent — see concepts/crash-consistent-replication — no application quiesce required; the replica is what a crash+reboot would produce.

Why continuous vs scheduled

Scheduled snapshots (AMIs, AWS Backup plans, per-service backup cadences) are crash-consistent too but with RPO = snapshot interval. Continuous replication pushes RPO to seconds at the cost of:

  • A replication agent (or hypervisor-level intercept) on source,
  • A network-bandwidth commitment to the replication target,
  • A staging area in the target (replicated-but-not-yet-launched state).

The break-even point vs scheduled snapshots is when RPO requirements move from "minutes acceptable" to "seconds required."

Why block-level vs app-level

App-level (logical) replication (database log shipping, streaming CDC, Redis replication) offers stronger consistency guarantees per workload but requires per-workload implementation. Block-level offers a single mechanism that works for every filesystem and every storage-consuming workload — the substrate of modern server-DR primitives.

Canonical AWS primitive

AWS DRS is the canonical native implementation: agents on source machines stream block changes to a staging subnet in the target Region; automated server conversion launches EC2 instances from the replicated state; RPO seconds, RTO 5–20 min. (Source: sources/2026-03-31-aws-streamlining-access-to-dr-capabilities)

Tier fit on the DR ladder

Maps naturally onto the middle tiers of the DR ladder:

  • Pilot light — the data tier is block-level-continuously-replicated into staging; compute is instantiated on failover.
  • Warm standby — the replica is also booted at reduced scale, so failover is scale-up rather than cold-start.

Above warm-standby is multi-site active-active, which requires a different replication model (bidirectional, conflict-resolving, app-aware) — block-level continuous doesn't extend there without significant additional machinery.

Tradeoffs

  • + Application-agnostic — one DR substrate for any EC2-style workload.
  • + Seconds-scale RPO without application cooperation.
  • + Minutes-scale RTO (5–20 min for DRS).
  • Block-level replication doesn't understand transactional boundaries — you get crash-consistent, not app-consistent.
  • Does not cover: Lambda functions, Auto Scaling logic, ECS task definitions / EKS pod specs, Route 53 / VPC / IAM config — needs full-workload orchestration layered on top (partner products like Arpio or custom tooling).
  • Steady-state cost of continuous replication agents + network
  • staging storage — not free like "we can restore from last night's backup."

Seen in

Last updated · 200 distilled / 1,178 read