CONCEPT Cited by 1 source
Fleet patching¶
Definition¶
Fleet patching is the operational capability of a managed-service vendor to deploy a software change (security patch, bug fix, config update) across the whole fleet of customer instances within a bounded time window, with known velocity, known failure semantics, and a known customer-courtesy contract for scheduling.
Distinct from in-place upgrade (one cluster, customer-driven), staged rollout (vendor-side but not scoped to a single change), and disaster recovery (fleet-wide but event-driven, not software-driven). Fleet patching is the specific capability of "here is a binary / config / rule; deploy it everywhere; keep customers running; tell me when it's done."
Why it matters¶
Fleet-patching velocity is the primary defence mechanism for the plurality of users on any managed service when a vulnerability ships in the vendor-operated substrate. The shared-responsibility line position determines who owns patch velocity — the vendor for managed-tier users, the customer for self-hosted-tier users — and the vendor's fleet-patching capability sets the upper bound on how quickly the managed tier can be made safe.
Also the enabling capability for vendor-first-patch coordinated disclosure: if the vendor can patch the fleet faster than attackers can develop exploits after CVE publication, the disclosure clock can be collapsed inside the vendor boundary.
The velocity-vs-safety trade-off¶
Fleet patching has four axes the vendor tunes:
- Velocity — how fast the change reaches the last instance. Bounded by the number of parallel patch workers, rollback capability, and customer-courtesy contracts (maintenance windows).
- Safety — how much a bad patch can damage. Mitigated by canary-first rollout, region-by-region phasing, staged rollout, and fast-rollback tooling.
- Customer courtesy — how much scheduling control customers keep. Bounded by the maintenance-window contract: customer controls routine cadence, vendor reserves an emergency channel.
- Observability — how the vendor knows the patch landed cleanly. Needs per-instance state ("patched" / "failed" / "deferred"), aggregate SLO tracking, and incident-response triggers if the patch itself causes errors.
Pushing velocity without the other three produces self-inflicted outages of the shape the Cloudflare 2025-12-05 outage documents — a fast, fleet-wide, insufficiently-staged rollout.
Canonical datapoint¶
MongoDB CVE-2025-14847 / "Mongobleed" (2025-12-12 → 2025-12-18):
- Scale: "tens of thousands of Atlas customers and hundreds of thousands of Atlas instances" (order-of-magnitude, undecomposed).
- Velocity: ~4.7 days detection → majority of fleet; ~6 days detection → full fleet including maintenance-window customers.
- Courtesy contract preserved: maintenance-window customers got ~15 hours of pre-notification (2025-12-17 21:00 → 2025-12-18) before their forced patch landed. "Maintenance window" as a concept is honoured; emergency override is pre-announced rather than silent.
- Tiered rollout: Atlas (managed, vendor-driven) vs Enterprise Advanced (customer-driven, patched-version distribution) vs Community Edition (customer-driven, community-forum notification). Three customer populations, three rollout velocities, three shared-responsibility line positions.
Not reproducible as a benchmark — no per-region / per-tier / per- cluster-size breakdown, no success-rate / rollback statistics — but sets an industry reference for managed-database fleet-patching velocity.
Structural requirements for fleet patching¶
From the MongoDB post and the broader wiki:
- Vendor owns the deployment substrate. Atlas's managed provisioning + scaling + backup is the precondition; vendors who can't touch customer runtime can't fleet-patch.
- Per-instance state tracking. Which instance is on which binary version; which patches landed; which are deferred.
- Rollback path. If the patch breaks, one-click (or one-script) reversion to the previous known-good version. Cloudflare's incident retrospectives repeatedly emphasise rollback as the load-bearing failure-response primitive.
- Customer-courtesy contract — documented maintenance windows with an explicit emergency-override policy. Courtesy is honoured by pre-notification, not silence.
- Observability across the rollout. Aggregate status board, per-region velocity metrics, error-rate deltas correlated with patch progress.
- Communication plan for customer-side coordination: status- page updates, email-to-affected-customers, post-hoc retrospective publication.
Relationship to other fleet-wide operational capabilities¶
- Fleet-patching ≠ fleet-rollout. Rollout (feature release) has different risk characteristics — generally longer time windows, heavier staging, often opt-in. Patching is often mandatory, fast, opt-out-restricted.
- Fleet-patching ≠ configuration push. Config push is the highest-velocity variant (seconds to minutes fleet-wide); binary patching is typically the slower variant (minutes to hours to days). See killswitch + global configuration system pairing in Cloudflare's architecture.
- Fleet-patching does not substitute for defense-in-depth. Even fast fleet-patching leaves a detection- to-fix window open; the vulnerability may have been exploitable before detection. DiD layers below the patched defect — auth / network / encryption / audit — catch what the patch didn't.
Seen in¶
- sources/2025-12-30-mongodb-server-security-update-december-2025 — Atlas fleet of hundreds of thousands of instances patched in ~6 days for CVE-2025-14847; maintenance-window customers honoured with pre-notification; tiered rollout to EA + Community via different channels. Canonical wiki instance.
Related¶
- concepts/maintenance-window — the customer-facing scheduling contract fleet-patching interacts with.
- concepts/coordinated-disclosure — the industry norm whose vendor-first-patch variant depends on fleet-patching velocity.
- concepts/shared-responsibility-model — the line that determines who owns patch velocity.
- concepts/blast-radius — size of damage from a bad patch; the risk fleet-patching must mitigate via staged rollout.
- patterns/rapid-fleet-patching-via-managed-service — the pattern realising fleet-patching for a managed tier.
- patterns/pre-disclosure-patch-rollout — the disclosure posture fleet-patching velocity enables.
- patterns/staged-rollout — generic rollout-phasing pattern fleet-patching specialises.
- patterns/progressive-configuration-rollout — config-push sibling of fleet-patching.