Skip to content

ZALANDO 2021-02-24

Read original ↗

Zalando — Integration tests with Testcontainers

Summary

A Zalando Marketing Services (ZMS) Java-backend engineer documents the team's integration-testing discipline: run real external components as Docker containers from inside JUnit 5 tests via the Testcontainers library, wired into a Spring Boot test context with @DynamicPropertySource. The post canonicalises the ZMS definition of integration test (test of the boundary between their code and one external component — DB, AWS service over Localstack, HTTP peer over MockServer), the test-pyramid proportions they target (integration tests ≈ 25% of unit tests, varying per application), Maven's Surefire + Failsafe split for running unit vs integration tests in different build phases, and the singleton-container pattern for amortising Docker startup cost across a test class hierarchy. Concrete numbers quoted: a Postgres Docker container takes ~4 s to start on the author's machine vs 0.4 s for H2, and Localstack up to ~20 s. The post explicitly flags that Testcontainers is not sufficient — mock servers don't catch real-API drift, so teams should pair Testcontainers with contract testing (Spring Cloud Contract is mentioned). This is a pragmatic Tier-2 Java testing-discipline post with enough disclosed numbers, architectural framing, and honest caveats to justify full ingest.

Key takeaways

  1. Define integration test as "our code ↔ one external component". ZMS's working definition: a test that exercises communication with a real external component — database, AWS service (S3, Kinesis, DynamoDB, SQS), HTTP peer — both happy-path and corner cases (unexpected HTTP codes, timeouts, 5xx). This narrows the generic Wikipedia "modules tested as a group" definition to a specific cross-system boundary, which is the boundary Testcontainers is designed to cover (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  2. Integration tests ≈ 25% of unit tests (ZMS empirical ratio). Anchored in Fowler's test pyramid: unit + component tests are the foundation; integration tests complement but never dominate. System tests and manual tests sit above, ideally the rarest. The 25% number is presented as a ZMS-team heuristic, not a blanket prescription — "it varies from application to application" (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  3. Testcontainers runs any Docker image from Java code. GenericContainer is the universal entry point; popular images (Postgres, Kafka, Localstack, MockServer, Redis) have pre-built wrapper classes with idiomatic APIs. Container lifecycle is tied to the JVM process via ShutdownHooks, and a companion container — Ryuk — reaps orphaned containers if the JVM crashes before ShutdownHooks fire. Ryuk can be disabled but is on by default (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  4. Maven: split Surefire (unit) and Failsafe (integration) by naming convention. Surefire runs *Test / *Tests / *TestCase classes in the test phase. Failsafe runs *IntegrationTest classes in the integration-test phase. A profile (with-integration-tests) gates whether the slower IT suite runs — mvn clean verify -P with-integration-tests on CI, plain mvn test locally. This keeps the developer feedback loop short while CI still enforces the fuller suite (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  5. Singleton-container pattern amortises Docker startup across a class hierarchy. The ZMS shape: an AbstractIntegrationTest base class holds static PostgreSQLContainer, Localstack, etc. fields; a static initialiser calls .start() once; Spring's @DynamicPropertySource (introduced 5.2.5, more compact than ApplicationContextInitializer) wires the container's dynamic port/URL into the app context. Concrete subclasses inherit the started containers. Containers never .stop() — JVM shutdown hooks do it. The alternative JUnit-Jupiter @Testcontainers / @Container annotations work per class (including sharing between methods) but cannot be reused between test classes and are not tested with parallel execution (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  6. Real Postgres over H2. Real Localstack over mocks. Core Testcontainers pitch: H2 doesn't support Postgres-specific functionality (partitioning, JSON operators), so tests that pass on H2 can still break on Postgres. Localstack lets you emulate AWS services offline, cut dev costs, and test corner cases like 5xx from S3 or delayed responses — things you cannot simulate against real AWS. Tradeoff: ~4 s Postgres startup vs 0.4 s H2 on the author's machine; Localstack up to ~20 s. Build + CI-machine sizes have to grow to absorb it (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  7. Tests on a shared container must still be FIRST. Clean Code's FIRST checklist — Fast, Isolated, Repeatable, Self-Validating, Thorough — still applies. Two strategies to keep tests isolated on a shared Postgres: (a) unique IDs / names per test so constraint collisions don't happen and no cleanup is needed (but aggregate queries like COUNT(*) now see other tests' rows); (b) explicit cleanup after each test (more developer effort, easy to forget). Concurrent execution requires even more discipline. ZMS uses (a) by preference (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  8. Even one IT verifies the Spring context and Flyway migrations. Zalando ZMS calls out that a single contextLoads()-style integration test is load-bearing: it confirms the application starts against a real database and that Flyway migration scripts run cleanly. This is a cheap-to-write, high-signal smoke test that unit tests cannot replicate (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

  9. Testcontainers is not sufficient — pair with contract testing. Explicit caveat from Zalando: an IT against a MockServer container does not catch real-API drift in the external service. The mock is as correct as the engineer who wrote it; production can still break. The post recommends pairing with contract testing (Spring Cloud Contract is named) as the missing defence (Source: sources/2021-02-24-zalando-integration-tests-with-testcontainers).

Systems / concepts / patterns extracted

Systems (extended / new)

  • systems/testcontainers — extended with the Zalando ZMS Java / JUnit 5 / Spring Boot altitude Seen-in (previously the page's only Seen-in was Canva's Bazel + CI hermeticity retrospective, which is a build-system altitude). The Zalando ingest is the canonical application-developer altitude: how to set up Testcontainers inside a running Java project with Spring Boot and Maven, not how a CI framework sandboxes them.
  • systems/junit5 — new minimal page. JUnit Jupiter test runner; @Testcontainers / @Container annotations come from the junit-jupiter Testcontainers module.
  • systems/maven-failsafe-plugin — new minimal page. Runs *IntegrationTest classes in Maven's integration-test phase, letting Surefire own the unit-test test phase.
  • systems/maven-surefire-plugin — new minimal page. Default Maven unit-test runner; includes *Test / *Tests / *TestCase.
  • systems/spring-boot — existing page; extended with @DynamicPropertySource (Spring 5.2.5) usage pattern as the idiomatic wiring for Testcontainers-provided dynamic ports and credentials.
  • systems/mockserver — new minimal page. HTTP-server-as-container for mocking external HTTP peers; canonical Testcontainers companion.
  • systems/wiremock — new minimal page. Same role as MockServer, alternative implementation.
  • systems/localstack — new minimal page. Emulates AWS services (S3, Kinesis, DynamoDB, SQS, etc.) in a Docker container for offline / low-cost integration testing.
  • systems/docker — existing page; extended with new Seen-in: Testcontainers is a Docker-runtime consumer running inside a JVM test process.
  • systems/ryuk-testcontainers-reaper — new minimal page. Testcontainers-bundled reaper container ensures orphaned images are cleaned up when the parent JVM dies before shutdown hooks can run.

Concepts (new)

  • concepts/test-pyramid — Mike Cohn's shape (popularised by Martin Fowler): unit-test foundation, narrower service/component/integration layer above, thin UI/E2E cap. Zalando's canonical ZMS ratio: ITs ≈ 25% of unit tests, varies per app.
  • concepts/first-test-principles — Fast, Isolated, Repeatable, Self-Validating, Thorough (Clean Code, Robert C. Martin). Zalando names it as the property contract ITs must still satisfy when running against shared containers.
  • concepts/h2-vs-real-database-testing — the specific anti-pattern of using an in-memory SQL emulator (H2) for tests of code that targets a real database (Postgres). Fast but false-negative on engine-specific features: partitioning, jsonb operators, LATERAL, PL/pgSQL, specific lock / isolation semantics.
  • concepts/singleton-container-pattern — container declared as static field on an abstract test base, started once per JVM, shared across every subclass test. Amortises Docker startup over the entire IT suite. Named as such in Testcontainers docs.
  • concepts/contract-testing — consumer- and provider-side contracts (schemas + example requests/responses) verified in CI on both sides to detect real-API drift that mock-based ITs can't. Zalando names Spring Cloud Contract as the tooling; sibling to Pact (not named here).

Patterns (new)

  • patterns/real-docker-container-over-in-memory-fake — prefer a real Postgres (Kafka, Redis, etc.) container over H2 / embedded-Kafka / in-memory-Redis for integration tests. Pay the startup cost to gain real-engine parity for corner-case features.
  • patterns/failsafe-integration-test-separation — Maven Surefire for unit tests in test phase, Failsafe for integration tests in integration-test phase, gated by a with-integration-tests profile. Preserves fast local feedback while enforcing fuller suite on CI.
  • patterns/shared-static-container-across-testsstatic container field on an abstract base test class, started in a static initialiser, inherited by every concrete IT subclass. Relies on JVM shutdown hooks (+ Ryuk) for cleanup. Amortises Docker startup cost across the suite.

Operational numbers

  • Postgres-as-Docker startup time: ~4 seconds (author's local machine); vs H2 in-memory: ~0.4 seconds.
  • Localstack startup time: up to ~20 seconds (author's local machine).
  • Target IT-to-unit ratio: ≈ 25% at ZMS, varies by application.
  • Maven phase separation: unit tests (Surefire) run in the test phase; integration tests (Failsafe) run in integration-test — the goal is to keep the default mvn test cycle short and opt into the slower suite via -P with-integration-tests.
  • Spring version floor for @DynamicPropertySource: 5.2.5 (vs older ApplicationContextInitializer).

Caveats

  • Tier-2 borderline include. The post is a Java integration-testing how-to, not a distributed-systems retrospective. It's included on the wiki on three grounds: (a) it canonicalises a specific set of testing-discipline primitives (singleton container, Surefire/Failsafe split, @DynamicPropertySource wiring, FIRST + shared-container isolation trade) that recur in JVM-backend shops industry-wide but had no wiki page; (b) it extends systems/testcontainers with an application-developer altitude Seen-in complementing Canva's CI-framework altitude; (c) it explicitly names the missing piece — real-API drift — and points at concepts/contract-testing, which was absent from the wiki.
  • No production numbers. Startup times are author's local machine; no disclosed CI-fleet throughput, flake rate, or test-count at ZMS. The 25% IT-to-unit ratio is a rule of thumb, not a measurement.
  • Single-company, single-team scope. ZMS is one Zalando sub-org. Other Zalando teams may use different testing stacks; this is not a Zalando-wide standard.
  • No shrinker / seed / parallel discussion depth. The parallel-execution caveat on @Testcontainers is quoted verbatim from the Testcontainers docs rather than explored. The post does not discuss Testcontainers Desktop, testcontainer reuse across JVMs (the .withReuse(true) primitive landed later), or rootless Docker implications.
  • Technique is pre-2021 industry-standard. Singleton containers and Surefire/Failsafe split are JVM-community practice from mid-2010s; this post canonicalises rather than invents. The wiki value is in named-primitive capture plus the explicit Testcontainers-is-not-sufficient framing.
  • Sibling post awaiting ingest. Earlier Zalando ingest sources/2021-02-01-zalando-stop-using-constants-feed-randomized-input-to-test-cases covers iOS/Swift randomised-input testing; together these two Zalando posts form a small testing-discipline axis at complementary altitudes (Java backend integration vs Swift mobile unit).

Source

Related

Last updated · 476 distilled / 1,218 read