Skip to content

CONCEPT Cited by 1 source

Maintenance window

Definition

A maintenance window is a customer-configurable contract with a managed-service vendor specifying when the vendor is allowed to perform disruptive operations — routine patches, version upgrades, restarts, hardware migrations — on the customer's resources. The window is typically a weekly recurring time range (e.g. "Sundays 02:00 – 04:00 customer local time") agreed at provisioning time, editable by the customer at any time.

The contract two-shapes as:

  • Customer side: predictable disruption; ability to align maintenance with lowest-traffic periods; ability to defer maintenance to when operators are on-call; protection from spontaneous vendor-side outages during business hours.
  • Vendor side: legitimate access window for operations that would otherwise require per-customer coordination at scale; documented license to restart / migrate / patch; contractual cover for routine-operation side-effects.

Load-bearing properties

  • Contract, not request. The customer sets the window; the vendor honours it. Outside the window, routine maintenance does not happen.
  • Customer-local time. Windows are expressed in the customer's timezone, not UTC, to align with local business hours.
  • Revisable. Customer can change the window at any time via the management console / API.
  • Bounded disruption semantics. The vendor specifies what a "maintenance event" can do — reboot, brief read-only period, rolling restart, version upgrade — so the customer can size retries / health checks / failover accordingly.

Emergency override

The contract breaks down for urgent security patches — if the vendor must patch within hours to days of detection for a serious vulnerability, waiting up to a week for the customer's window to come around is infeasible.

Three override shapes exist:

  1. Silent override. Vendor patches when it needs to, regardless of window. Customer learns after the fact. Breaks trust in the window contract.
  2. Pre-notification override. Vendor notifies customer in advance (hours to a day) that the window will be overridden for an urgent patch, gives the customer time to adjust workload, then patches on the vendor's schedule. Preserves the courtesy contract; adds an emergency escape hatch.
  3. Customer-opt-in-to-emergency. Customer declares up-front that urgent patches may override the window (typical default).

Canonical pre-notification-override realisation: MongoDB CVE-2025-14847 (2025-12-17 21:00 → 2025-12-18): MongoDB proactively notified Atlas customers with configured maintenance windows that an urgent patch would land the next day. "We proactively notified Atlas customers with maintenance windows configured that we would perform an urgent patch the following day, as part of our established policy." Maintenance-window customers were patched in the same ~6-day window as the rest of the Atlas fleet, without silent override.

Relationship to fleet-patching

Maintenance windows are a brake on fleet patching velocity — without an override path, a fleet-wide patch cannot land faster than the slowest customer's window. Vendors resolve this at architecture time by declaring a policy:

  • "Routine maintenance lands in your configured window."
  • "Emergency security patches may override your window with ≥N hours of pre-notification."

The policy is part of the managed-service contract; MongoDB's 2025-12-30 post names theirs as "our established policy."

Non-Atlas realisations

  • AWS RDS / Aurora — weekly 30-minute window; vendor-controlled patch application; override for "mandatory" security patches with customer notification.
  • AWS EKS — cluster-version upgrades + node-group maintenance windows; EKS Auto Mode shifts node lifecycle entirely to AWS and surfaces customer-side disruption controls (PDBs + Node Disruption Budgets + maintenance windows) as the retained responsibility.
  • Cloudflare — customer-facing cache / config rolls out through the rulesets engine and global config system; edge-binary rollout uses staged progression rather than customer windows because the edge is shared.

The Atlas case is notable for being per-instance (each cluster its own window) rather than per-region or per-tier, making the customer-side control granular.

Seen in

Last updated · 200 distilled / 1,178 read