CONCEPT

H2 vs real database testing¶

Definition¶

A recurring testing-infrastructure choice: run JVM tests against an embedded, in-memory SQL emulator (classically H2, also HSQLDB / Derby) configured in MySQL/PostgreSQL compatibility mode, or against a real database engine (Postgres, MySQL) running in a Docker container via systems/testcontainers.

H2 is ~10× faster to start (~0.4 s vs ~4 s Postgres on Zalando's author machine — ) and needs no Docker daemon. But the compatibility-mode emulation is partial: it re-implements SQL syntax but not engine internals. Any test that exercises engine-specific behaviour can pass on H2 and fail in production.

What H2 misses for Postgres¶

Non-exhaustive:

jsonb operators — @>, ?, ?&, ?|, #>, jsonb_path_query. H2 has a JSON type but different operator semantics.
Partitioning — declarative table partitioning (PARTITION BY RANGE / LIST / HASH), partition pruning, partition-wise joins. H2 has none of this.
LATERAL joins — planner behaviour differs.
PL/pgSQL stored procedures — H2 uses Java; any CREATE FUNCTION … LANGUAGE plpgsql silently no-ops or errors.
Locking + isolation level semantics — Postgres's MVCC (row-level locks, SELECT … FOR UPDATE with NOWAIT / SKIP LOCKED, serializable snapshot isolation) is not what H2 does.
Specific error codes / constraint error shapes — tests that pattern-match on SQLSTATE or exception class produce false negatives.
Full-text search (tsvector, tsquery, GIN), PostGIS, extensions in general.
Replication semantics, logical decoding — can't be tested at all on H2.
Migration scripts — Flyway .sql files with Postgres-specific syntax (DO $$ … $$, GENERATED ALWAYS AS IDENTITY, CREATE EXTENSION) fail on H2.

The failure mode H2 produces¶

A green test suite + red production. The worst variant is passing tests for intentionally Postgres-specific code: the team writes a jsonb query, tests it against H2 (in MySQL or Postgres compat mode), the test passes because H2's JSON behaviour coincidentally agrees on the happy path — and a corner case (null-vs-missing key, nested-array operators) breaks in production.

When H2 is still OK¶

Pure unit tests of SQL-generation code that never actually executes (e.g. build-a-SQL-string logic).
Super-thin DAO tests that only SELECT * FROM t WHERE id = ? with no engine-specific features — but the gain over a pure unit test is marginal.
DB-unaware tests that happen to need a JDBC connection (wiring tests) and explicitly do not test DB behaviour.

What real-container testing costs¶

~4 s Postgres startup on a modern dev laptop. Multiplied over hundreds of tests without singleton sharing, this dominates test-suite time.
Docker requirement — developer laptops and CI workers must have Docker Engine or a compatible runtime.
CI fleet sizing — build hosts need more RAM and CPU; Localstack alone can eat GB of RAM.

The amortisation answer is the concepts/singleton-container-pattern: one container per JVM, shared across every test.

Seen in¶

— Zalando ZMS names the trade-off explicitly: "real PostgreSQL instead of the H2 database, which doesn't support the Postgres-specific functionality (e.g. partitioning or JSON operations)". 4 s vs 0.4 s startup cost disclosed as the tax.

patterns/real-docker-container-over-in-memory-fake — the pattern this concept motivates.
systems/postgresql — the real engine.
systems/testcontainers — the mechanism.
concepts/test-pyramid — the layer this decision lives in.