Skip to content

PATTERN Cited by 2 sources

Opt-in driver-level LSN flush

Intent

When a driver-layer feature (a client driver advancing server state based on wire-protocol cues the application layer never sees) is safe for some deployments but breaks the operator contract for others, the right design is not to enable or disable it globally — it's to make it opt-in per deployment, exposed as an explicit framework configuration with a conservative default.

The canonical instance on this wiki: Debezium's lsn.flush.mode config property (Zalando 2025-12, DBZ-9641) exposes the pgjdbc keep-alive LSN advancement feature as one of three modes, with connector (Debezium-only flushes) as the safe default and connector_and_driver as the opt-in.

Context

The pattern applies when three properties all hold:

  1. A client driver (pgjdbc, a Kafka client, an S3 SDK) can advance server-side state (replication slot LSN, consumer group offset, multipart upload state) in response to wire-protocol cues the overlying framework doesn't route.
  2. The driver-level advancement is legitimately safe for some deployments — typically those where the server-side state is treated as authoritative, or where the deployment owns operational invariants the framework doesn't know.
  3. The driver-level advancement breaks the operator contract for other deployments — typically those where the framework's tracked state (Kafka Connect offset topic, application-level checkpoint) is treated as authoritative, and server-side advancement is a protocol-level surprise.

This is the shape of the 2023→2025 Debezium-pgjdbc story:

  • 2023 — Zalando ships pgjdbc 42.7.0 with KeepAlive-LSN advancement upstream ( fix at the driver layer). Works for Zalando; causes downstream Debezium users to see "Saved offset is before replication slot's confirmed lsn" errors.
  • 2024 — Debezium hard-disables the feature globally with withAutomaticFlush(false) in PR #6472. Unblocks most users; breaks Zalando's upgrade path.
  • 2025-12 — Zalando contributes lsn.flush.mode (DBZ-9641) to Debezium 3.4 to make the feature opt-in per deployment with connector (disabled) as the safe default.

Problem

A global ON or OFF choice creates a policy conflict:

  • Global OFF — safe default but breaks deployments where the feature is the stable production shape (Zalando).
  • Global ON — enables the feature for deployments that can't tolerate it (Kafka-Connect-offset users).
  • Framework-level heuristic — attempts to detect which shape the deployment is in, but the decision depends on operator-side invariants (slot-survives-failover discipline, trust in offset-store durability) that the framework fundamentally cannot know from the config surface.

Shape

  1. Expose the driver-layer feature as a framework configuration, not a driver configuration. The driver can still ship the feature unchanged; the framework opts in or out based on operator intent.
  2. Default to the conservative side — the framework's historical behaviour, the one that matches the larger user population's operational invariants. In Debezium's case, connector mode (Debezium flushes, driver does not).
  3. Provide an opt-in path that lets deployments with the required invariants explicitly enable the feature. connector_and_driver in Debezium's case.
  4. Optionally expose a third escape-hatch for operators who manage LSN flushing entirely externally — manual in Debezium's case.
  5. Couple with operator-side invariant declarations elsewhere in the config surface — the driver-flush opt-in only makes sense when combined with offset.mismatch.strategy that tells the framework how to reconcile two independent position-tracking sources on startup.

Examples on wiki

  • Debezium 3.4 lsn.flush.mode (canonical, 2025-12) — makes the pgjdbc keep-alive flush opt-in after the feature was globally disabled due to user-reported issues. Zalando contributed DBZ-9641 / PR #6881.
  • (Expansion candidate — Kafka consumer auto.offset.reset plays an analogous role in the consumer-side offset- reconciliation space.)

Consequences

Upside:

  • Users with the required invariants keep their proven-safe deployment shape.
  • Users without them inherit a safe default.
  • The framework-vs-driver coupling is made explicit in the config surface rather than hidden in the driver's behavioural choices.

Downside:

  • Configuration surface grows. lsn.flush.mode is one more property for every operator to learn.
  • Documentation burden — the interaction between multiple driver-layer and framework-layer flags (lsn.flush.mode × offset.mismatch.strategy × backing-store choice) becomes combinatorially complex.
  • Operators may misconfigure (enable connector_and_driver without the slot-survives-failover discipline that makes it safe), so documentation must include the preconditions, not just the knob.

When not to apply

Don't use when the driver-layer feature is unambiguously safe or unambiguously unsafe for all deployment shapes. Opt-in configurations add cognitive load; only worth it when different deployments with different invariants legitimately need different behaviour.

Seen in

  • sources/2025-12-18-zalando-contributing-to-debezium-fixing-logical-replication-at-scalecanonical wiki introduction. Zalando contributes lsn.flush.mode to Debezium 3.4 to re-enable the pgjdbc keep-alive LSN advancement feature (their 2023 contribution) as opt-in rather than default-on, resolving the conflict between their fleet's proven shape and the larger Debezium user population's operator contract.

  • sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver — the prior chapter. Zalando upstreamed the pgjdbc KeepAlive-LSN advancement feature itself via PR #2941, using client-driver fix at the driver layer as the architectural lever. The opt-in pattern this page canonicalises only became necessary after that driver fix was deployed at scale and showed both its win at Zalando and its friction for other deployments.

Last updated · 428 distilled / 1,221 read