Skip to content

CONCEPT Cited by 3 sources

Evolutionary database design

Definition

Evolutionary database design is the discipline of treating database schemas as first-class artefacts that evolve incrementally alongside application code, with the same engineering rigour as source-code refactoring: small named transformations applied deliberately, version-controlled migration scripts that travel with application changes, the application and database evolving in lockstep through CI/CD, and "everybody gets their own database instance" during development so experimentation is cheap and isolated. The methodology was articulated by Martin Fowler in the 2003 essay Evolutionary Database Design and operationalised by Pramod Sadalage in the 2006 book Refactoring Databases: Evolutionary Database Design, which catalogs 70+ named database refactorings (Split Column, Move Column, Add Lookup Table, Encapsulate Table With View, Introduce Surrogate Key, etc.) along with the transition mechanics to apply each safely against live data.

The methodology rests on seven practices (Fowler 2003, restated in Sadalage 2006):

  1. DBAs collaborate closely with developers. Database changes are not gated through a separate review queue; the DBA is on the same team as the developers and reviews changes as part of the feature work.
  2. Everybody gets their own database instance. Each developer has an isolated database environment to experiment in — see concepts/practice-4-everybody-gets-their-own-database-instance for the canonical wiki page on this practice specifically.
  3. Developers frequently integrate to a shared master. Schema changes flow into a shared mainline through CI integration, not through long-lived development branches.
  4. A database consists of schema and test data. The "database" under version control isn't just the DDL — it's the schema plus the reference / test data needed to make the application work.
  5. All changes are database refactorings. Each schema change is a named, deliberate, small transformation drawn from a catalog — not an ad-hoc ALTER TABLE.
  6. Automate the refactorings. Each refactoring has a script that applies it; manual schema editing is a code smell.
  7. Version-control everything, including the schema. Migration scripts live in the same repo as application code and are reviewed alongside it.

The twenty-year gap between methodology and substrate

Fowler 2003 + Sadalage 2006 articulated the methodology completely; the Continuous Delivery book (Humble & Farley, 2010, Chapter 12 "Managing Data") brought migration scripts into the deployment pipeline — making database-changes-as-code part of the broader CI/CD movement. What CD did not solve was per-pipeline isolation: pipelines could run migrations, but they still needed a target database, and that target was almost always shared. Practice #4 ("everybody gets their own database instance") stayed aspirational on most teams because true per-developer production-shaped databases cost time, money, and DBA cycles.

The post canonicalising this gap (Databricks 2026-05-29):

"The methodology described in Evolutionary Database Design and operationalized in Refactoring Databases: Evolutionary Database Design has been clear for twenty years. The seven practices, the catalog of 70+ named refactorings, the transition mechanics – all of it documented, peer-reviewed, taught."

"That methodology reached CI/CD in 2010 with Continuous Delivery (Chapter 12: Managing Data). Migrations became first-class artifacts in the deployment pipeline. The discipline of database-changes-as-code reached the broader CI/CD movement. What CD didn't solve was per-pipeline isolation: pipelines could run migrations, but they still needed a target database, and that target was shared."

(Source: sources/2026-05-29-databricks-enabling-evolutionary-database-development-database-branching-with-lakebase)

The compensating layer

Because Practice #4 stayed aspirational, teams built a compensating layer to work around it:

  • Mock objects — the database interface is faked in unit tests; query-planner, constraint-enforcement, and transaction semantics are absent.
  • In-memory database substitutes — H2 or SQLite stand in for the production database; SQL dialect drift between substrate and production produces "works on my machine, fails in staging" bugs.
  • Shared staging environments — one database serves the whole team; concurrent feature work collides over schema and data; the dev DB becomes a scheduling problem.
  • DBA ticket queues — schema changes that require production- shaped test data go through the DBA, who serialises requests through their calendar.

The compensating layer became foundational methodology by default, not by design. Whole bookshelves of advice on testing patterns, mock hierarchies, dialect-translation libraries, and staging-database hygiene exist because Practice #4 was unaffordable, not because the methodology required them.

Database refactoring catalog

Sadalage's Refactoring Databases defines a catalog of 70+ named database refactorings, each with a problem statement, mechanics for applying the transformation safely against live data, transition-period strategies (e.g. maintaining old + new schemas during migration), and rollback considerations. Examples:

Refactoring What it does
Split Column One column with composite content → multiple typed columns. (Fowler 2003 worked example: Jen splits inventory_code into location_code, batch_number, serial_number.)
Move Column Column moves between tables to follow normalisation or access patterns.
Encapsulate Table With View Table accessed via view, allowing future restructuring without breaking consumers.
Introduce Surrogate Key Replace natural key with auto-generated surrogate.
Add Lookup Table Inline values get factored into a separate reference table.
Replace LOB With Table LOB column → child table for relational access.
Merge Columns Inverse of Split Column — combine related columns into one.
Migrate Method From Database Move stored-procedure logic into application code.
Insert Trigger Add a trigger to maintain invariant during transition.
Drop Column Remove column after consumers have migrated off.

The full catalog is hosted at databaserefactoring.com, maintained by Sadalage as a living reference.

Why the substrate matters

The methodology is substrate-independent — the seven practices and the refactoring catalog work on Postgres, MySQL, Oracle, SQL Server, or any other relational engine. What changes with the substrate is how affordable Practice #4 is.

Substrate Practice #4 cost Compensating layer needed?
2003 mainstream (commercial RDBMS, pre-cloud) Per-developer DB = full provisioning ticket, weeks of DBA work Yes — heavy mocks + shared staging
2010 cloud-VM era Per-developer DB = pg_dump + EC2 → still hours, still stale Yes — moderate mocks + staging
2020 container era Per-developer DB = docker run postgres + seed → minutes, but no production-shaped data Yes — H2 / SQLite + sometimes staging
2026 copy-on-write era (systems/lakebase / Neon, PlanetScale) Per-developer DB = sub-second branch from production storage; production-shaped, isolated, governance-propagated No — compensating layer becomes obsolete

The 2026 substrate change is what makes Practice #4 operational default, not aspirational. The Databricks 2026-05-29 post argues that the methodology hasn't changed; the capability under it has.

What changes when the constraint lifts

Once Practice #4 is affordable:

Seen in

Last updated · 542 distilled / 1,571 read