Skip to content

PATTERN Cited by 1 source

Parallel Docker image (prod vs test) for patched library rollout

Intent

When rolling out an experimental patch to a transitive library (driver / framework / SDK), build two Docker images from the same source tree: one with the unmodified library, one with the patched library. Deploy the patched image to the test environment only; keep the unpatched image in production. Verify behaviour in test under production-representative load; if clean, promote the patched image to production.

The pattern is a lightweight rollout-risk discipline for patched transitive libraries whose full behaviour impact isn't yet known.

Canonical instance: Zalando, pgjdbc 42.6.1-patched (2023)

Zalando's 2023-11-08 post (sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver) canonicalises this shape:

  • Zalando's Postgres-sourced event-streaming apps transitively depended on an old pgjdbc version (via a pinned Debezium that preserves backwards compatibility).
  • Zalando locally backported their own KeepAlive-LSN- advancement fix into a 42.6.1-patched jar.
  • Build scripts were modified to emit two Docker images per service:
  • Unchanged image — original transitive pgjdbc.
  • Patched image — the 42.6.1-patched jar wired in via a transitive- dependency override.
  • Rollout: patched image → test environment; unchanged image → production.

Zalando's framing verbatim:

"we modified our build scripts to optionally override the latest version of the transitive PgJDBC dependency and we took advantage of this option to build not one, but two Docker images for our applications: one unchanged and another with a locally built version, 42.6.1-patched, of PgJDBC that contained our fix. We rolled this modified Docker image out to our test environment while still using the unchanged image in our production environment. This way we could safely verify that our event-streaming applications continued to behave as intended and monitor the behaviour in order to verify the issue of WAL growth had been addressed."

The verification was a multi-day WAL-size graph comparison on a low-activity database; the before/after shape was the disambiguation signal. After the graph confirmed flat WAL growth under the patch, Zalando promoted the patched image ("we could confidently roll out the newer Docker images to our production clusters").

Why it works

The pattern solves a specific trust problem: a locally-built patched library has:

  • No upstream signing / maintainer verification.
  • No widespread community testing.
  • Subtle potential behaviour drift vs the upstream behaviour it emulates.

Parallel prod/test images give the patched library a staged soak in a realistic environment without risking the live production fleet. The two images come from the same source tree, so the only delta under test is the library itself — ruling out confounding variables.

Structural properties

  • Source tree is shared. Both images build from the same commit; only a build-script toggle differs.
  • Rollout is not feature-flagged at runtime. The decision is made at build time via the transitive-override choice, and at deploy time via which image each environment pulls. No runtime kill switch.
  • Promotion is a deploy, not a flag flip. When the patched image is promoted to production, it replaces the unchanged image — no partial rollout within a single service.

Contrast with adjacent patterns

When to choose it

  • Patched transitive library. The fix lives in a dependency one or more layers deep; you control the build but not the release cadence of the library.
  • Soak time matters. The patched library's behaviour under production-representative load is the signal you need before committing.
  • The fix is straightforward to build into a Docker image. If the patched library requires non-trivial binary toolchain work, the pattern breaks down; use a shared artifact registry instead.

When not to choose it

  • Stateful library behaviour is under test. If the patched library's effect only surfaces under write traffic that production generates but test doesn't, the test soak is insufficient.
  • Library is load-bearing for data correctness. Prefer patterns/pre-flight-flight-post-flight-upgrade-stages with shadow / validation discipline rather than a single staged promotion.

Seen in

Last updated · 501 distilled / 1,218 read