CONCEPT Cited by 5 sources
Copy-on-write storage fork¶
Definition¶
A copy-on-write storage fork is a storage-cloning
mechanism that creates a second logical copy of a
dataset without initially duplicating the underlying
pages — the clone and the original share the same
physical pages on disk until one side writes, at which
point the modified page is copied and the two logical
copies diverge at that page only. The clone is
effectively instantaneous (no data-copy work at fork
time) and incremental in ongoing storage cost (pay
only for divergent pages). This is the storage-tier
analog of Unix fork() with copy-on-write memory
pages.
On Amazon Aurora, copy-on-write storage forks are the substrate under blue/green deployments: the green environment is created as a storage fork of the blue environment's cluster volume, then glued together by binlog replication for ongoing sync of committed transactions.
Trade-offs¶
Copy-on-write forks solve the fast-clone cost problem (no duplicate storage up-front, instantaneous clone) but introduce new failure modes:
- Divergent-page cost accrues silently — every write on either side creates a divergent page; the two-environment storage bill grows with write volume, not fork age.
- Concurrent writes on both sides = unreconcilable state — if both blue and green accept writes to the same row, the two sides now have divergent physical pages for the same logical data. The underlying storage layer has no knowledge of which version is "correct" — conflict resolution is deferred to the operator.
- Binlog replication doesn't round-trip copy-on-write — the clone mechanism is at the storage layer; cross-environment sync happens at the transaction layer via binlog replication, which has its own schema-change envelope (see [[concepts/ binlog-replication]]).
Seen in¶
-
sources/2026-05-29-databricks-enabling-evolutionary-database-development-database-branching-with-lakebase — Fifth canonical wiki instance — evolutionary-database- development substrate. Databricks 2026-05-29 Tier-3 post (Part 1 of three-part series). Frames copy-on-write branching as the substrate change that finally makes Practice #4 of evolutionary database design (Fowler 2003) operationally default. Verbatim: "In 2026, copy-on-write database branching arrives in Databricks Lakebase. A one-second, zero-storage-at- creation branch of a terabyte-scale production database is now an O(1) operation. The constraint that kept Practice #4 aspirational has lifted." No new architectural disclosure on the copy-on-write mechanism itself (the LangGuard and Backstage POC sources covered the storage-mechanism level); this source's contribution is the methodology arc — naming the compensating layer (mocks, H2/SQLite, shared staging, DBA ticket queues) as substrate-driven, not principle-driven, and positioning copy-on-write branching as what lets the layer come out. Three load-bearing properties of the resulting developer-DB instance canonicalised: fast / realistic / isolated — all three simultaneously, where each historical compensating-layer alternative violated at least one.
-
sources/2026-04-30-databricks-backstage-with-lakebase — Fourth canonical wiki instance + operational-datum milestone. Thoughtworks Backstage POC discloses the first concrete data-plane timing on a Lakebase copy-on-write fork: 1.09 seconds for a 63 MB dataset (63 MB Backstage catalog). Separates control-plane acknowledgement (instant) from data-plane clone (1.09 s) — behaviour predicted by the copy-on-write architecture but previously not quantified on the wiki. Extends the concept to two distinct use-case axes on the same substrate demonstrated in one POC: (a) forward-branching for development workflow (1.09 s for 63 MB); (b) backward-branching as PITR (3.78 s end-to-end for a 32-row incident recovery, with 12-second WAL-record snap-back). Canonicalises the architectural unification: "Branching and Point-in-Time Recovery (PITR) are essentially the same primitive: branching is just PITR with source_branch_time = now" — see patterns/branching-is-pitr-with-time-now. New pattern: patterns/database-branch-per-test-over-mocking uses cheap copy-on-write forks to deprecate 20-30% of test code (mock objects). Branch API disclosure: requires
spec-nested body with explicit lifetime (ttl/expire_time/no_expiry). -
sources/2026-04-29-databricks-and-stripe-projects-infrastructure-built-for-agents — Copy-on-write branching as agent-operation primitive. Third canonical wiki instance of copy-on-write storage forks at Lakebase/Neon altitude (after the 2026-04-27 LangGuard governance-policy- testing instance and the Aurora blue/green baseline). Verbatim: "Using zero-copy cloning, agents can create isolated branches of production data in seconds. This allows autonomous systems to safely test code, run migrations, or experiment with new prompts against live data states without risking the primary production environment or incurring massive storage overhead." Expands the use-case surface from governance-policy testing (LangGuard, human governance team as experimenter) to agent-operation primitive — the agent itself is the experimenter, running code / schema / prompt experiments against branched production state and discarding branches at agent-cadence. Explicitly names the storage-overhead bound the mechanism provides ("without … incurring massive storage overhead") as the load-bearing property for high-fanout agent-branching workloads. Pairs with the sub-350 ms provisioning number to make branch creation a routine-per-agent-task operation rather than a per-experiment ceremony. Two-side-writeable risk from Aurora blue/green remains absent because branches are intentionally short-lived (discarded after the agent task).
-
sources/2026-04-27-databricks-inside-one-of-the-first-production-deployments-of-lakebase-langguard — LangGuard/Lakebase as second canonical wiki instance of copy-on-write storage forks, at the Neon-lineage serverless Postgres altitude rather than Aurora's managed-RDS altitude. Verbatim mechanism disclosure: "When we create a branch, no data is physically copied. The branch diverges from the current database state using copy-on-write semantics, consuming storage only for new or modified data." Named use case is governance policy testing against real production trace data — a materially new axis beyond Aurora's blue/green deployment use case, because the two-side-writeable risk Morrison II canonicalised for Aurora does not apply: the branch is intentionally read-only against production workload (writes on the branch only come from the policy test), and the branch is discarded after the test so divergent-page accrual stays bounded. Expands the copy-on-write-fork concept from "clone storage for blue/green" to "clone storage for isolated testing against live data". Pattern: patterns/policy-testing-via-database-branching.
-
— Brian Morrison II (PlanetScale, 2024-02-02). Canonical wiki disclosure of Aurora's blue/green copy-on-write storage clone + the two-side-writeable data-consistency risk. Verbatim: "Amazon's blue/green deployment initially duplicates only compute resources and clones data storage using a copy-on-write mechanism. This can help with storage costs when running parallel environments but introduces potential data inconsistencies across environments. Since writes are allowed in the green environment, the same data can technically be changed in both environments. If this happens, Amazon has no easy or automated way to reconcile which version is correct. Resolving conflicts is challenging, and the responsibility for data consistency falls on you."
Related¶
- concepts/blue-green-deployment — the deployment strategy on top of the copy-on-write clone.
- concepts/binlog-replication — the ongoing-sync mechanism between the two forked environments.
- systems/amazon-aurora — canonical implementation.
- patterns/blue-green-database-deployment — the Aurora-family pattern built on the clone.