Enabling Evolutionary Database Development: database branching with Lakebase, continued (Part 2)¶
Summary¶
Part 2 of Databricks' three-part series on Evolutionary Database Development revisits Martin Fowler's 2003 seven practices twenty years later, showing how Lakebase's copy-on-write database branching removes the practical constraints that kept five of those practices aspirational. The post expands the playbook from 7 to 11 practices, adding destructive testing as default (Practice #8), A/B variant prototyping at the database level (Practice #9), governance-inherited-by-branches (Practice #10), and agent-as-practitioner (Practice #11). The bulk of the article details the CI/CD workflow mechanics (GitHub Actions templates for per-PR branch creation, schema-diff posting, and branch cleanup on merge) and provides worked playbook entries for each practice with explicit anti-patterns.
Key Takeaways¶
-
Five of the original seven practices had implementation constraints until 2026. Practices #1, #4, #5, #6, and #7 were limited by shared-database contention, provisioning cost, and DBA calendar serialization. Copy-on-write branching removes all of these at O(1) cost.
-
Per-PR CI branch creation via GitHub Actions.
pr.ymltriggers onpull_request: [opened, synchronize], createsci-pr-<N>forked from the PR's base branch, applies migrations, runs tests against real Postgres, and posts a schema diff as a PR comment.merge.ymldestroys the branch on merge. -
Expand-and-contract as the canonical schema migration strategy. Irreversible work should be split across migrations: add new columns first (expand), populate, swap readers, then drop old columns (contract) in a later migration after a deployment cycle confirms no live readers reference the old shape.
-
Idempotent migrations are a hard requirement. Because the same migration runs against many branches over the lifecycle of a transition, non-idempotent migrations are treated as bugs. Migration frameworks (Flyway, Liquibase, Alembic, Knex) track applied state via metadata tables.
-
Destructive testing becomes routine (Practice #8). When branch reset costs one second, chaos-style destructive tests (kill migration at 50%, simulate partial restore, DR runbook timing) fit inside a normal feature cycle without ops calendars or approval gates.
-
A/B variant prototyping at database level (Practice #9). Two competing schema designs are built on parallel branches off the same parent. Teams measure query latency, migration time at production volume, index footprint, and lock contention, then keep the winner and document the decision.
-
DBA role reframed as async PR reviewer. The DBA is a CODEOWNER on schema-affecting paths, reviews the auto-posted schema diff inline on the PR, and shifts focus from "will this break" to "is this the right design" (indexing strategy, data integrity, extensibility, maintainability).
-
Database refactoring catalog (2006) as the taxonomy. 70+ named refactorings from databaserefactoring.com provide the named vocabulary. Schema changes that don't map to a known refactoring likely combine multiple refactorings and should be split.
-
Branches are ephemeral; migrations are durable. The branch is the workspace, disposable by design. The migration script is the persistent artifact that carries the change forward. Treating any branch as durable beyond its purpose is an anti-pattern.
-
Production-shaped data as the default test substrate. Developers fork branches off production (or staging with production-shaped data), eliminating the "works in dev, fails in prod" class of bugs from schema changes. Jen's worked example: seeding ~1% corrupted edge-case data into a branch before running the migration to validate handling.
Operational Numbers¶
- Branch creation time: ~1 second regardless of parent database size (1 MB or 1 TB).
- Storage cost at creation: zero (shared pages; only divergent pages consume storage).
- Four branches per feature: per-developer, per-CI, two A/B exploration branches — all in seconds, all isolated.
Systems & Concepts Extracted¶
| Entity | Type | Status |
|---|---|---|
| systems/lakebase | system | exists — update |
| concepts/evolutionary-database-design | concept | exists — update |
| concepts/database-branching | concept | exists — update |
| concepts/copy-on-write-storage-fork | concept | exists — update |
| patterns/ci-ephemeral-database-branch-with-schema-diff-comment | pattern | exists — update |
| patterns/per-developer-database-branch-paired-with-code-branch | pattern | exists — update |
| patterns/expand-and-contract-schema-migration | pattern | new |
| patterns/a-b-variant-prototyping-at-database-level | pattern | new |
| patterns/destructive-testing-on-ephemeral-branch | pattern | new |
| concepts/idempotent-migration | concept | new |
Caveats¶
- Databricks is the vendor of Lakebase; the post is authored to promote its adoption. The methodology (Fowler 2003) is vendor-neutral, but the "constraint is lifted" framing assumes a Lakebase-like substrate (Neon would also qualify).
- Practices #10 (governance) and #11 (agents) are deferred to Part 3; this post previews but does not substantiate them.
- No latency numbers for production workloads running against branches (only creation-time claims).