Skip to content

CONCEPT Cited by 1 source

H2 vs real database testing

Definition

A recurring testing-infrastructure choice: run JVM tests against an embedded, in-memory SQL emulator (classically H2, also HSQLDB / Derby) configured in MySQL/PostgreSQL compatibility mode, or against a real database engine (Postgres, MySQL) running in a Docker container via systems/testcontainers.

H2 is ~10× faster to start (~0.4 s vs ~4 s Postgres on Zalando's author machine — sources/2021-02-24-zalando-integration-tests-with-testcontainers) and needs no Docker daemon. But the compatibility-mode emulation is partial: it re-implements SQL syntax but not engine internals. Any test that exercises engine-specific behaviour can pass on H2 and fail in production.

What H2 misses for Postgres

Non-exhaustive:

  • jsonb operators@>, ?, ?&, ?|, #>, jsonb_path_query. H2 has a JSON type but different operator semantics.
  • Partitioning — declarative table partitioning (PARTITION BY RANGE / LIST / HASH), partition pruning, partition-wise joins. H2 has none of this.
  • LATERAL joins — planner behaviour differs.
  • PL/pgSQL stored procedures — H2 uses Java; any CREATE FUNCTION … LANGUAGE plpgsql silently no-ops or errors.
  • Locking + isolation level semantics — Postgres's MVCC (row-level locks, SELECT … FOR UPDATE with NOWAIT / SKIP LOCKED, serializable snapshot isolation) is not what H2 does.
  • Specific error codes / constraint error shapes — tests that pattern-match on SQLSTATE or exception class produce false negatives.
  • Full-text search (tsvector, tsquery, GIN), PostGIS, extensions in general.
  • Replication semantics, logical decoding — can't be tested at all on H2.
  • Migration scripts — Flyway .sql files with Postgres-specific syntax (DO $$ … $$, GENERATED ALWAYS AS IDENTITY, CREATE EXTENSION) fail on H2.

The failure mode H2 produces

A green test suite + red production. The worst variant is passing tests for intentionally Postgres-specific code: the team writes a jsonb query, tests it against H2 (in MySQL or Postgres compat mode), the test passes because H2's JSON behaviour coincidentally agrees on the happy path — and a corner case (null-vs-missing key, nested-array operators) breaks in production.

When H2 is still OK

  • Pure unit tests of SQL-generation code that never actually executes (e.g. build-a-SQL-string logic).
  • Super-thin DAO tests that only SELECT * FROM t WHERE id = ? with no engine-specific features — but the gain over a pure unit test is marginal.
  • DB-unaware tests that happen to need a JDBC connection (wiring tests) and explicitly do not test DB behaviour.

What real-container testing costs

  • ~4 s Postgres startup on a modern dev laptop. Multiplied over hundreds of tests without singleton sharing, this dominates test-suite time.
  • Docker requirement — developer laptops and CI workers must have Docker Engine or a compatible runtime.
  • CI fleet sizing — build hosts need more RAM and CPU; Localstack alone can eat GB of RAM.

The amortisation answer is the concepts/singleton-container-pattern: one container per JVM, shared across every test.

Seen in

Last updated · 476 distilled / 1,218 read