CONCEPT Cited by 2 sources

Maintenance window¶

Definition¶

A maintenance window is a customer-configurable contract with a managed-service vendor specifying when the vendor is allowed to perform disruptive operations — routine patches, version upgrades, restarts, hardware migrations — on the customer's resources. The window is typically a weekly recurring time range (e.g. "Sundays 02:00 – 04:00 customer local time") agreed at provisioning time, editable by the customer at any time.

The contract two-shapes as:

Customer side: predictable disruption; ability to align maintenance with lowest-traffic periods; ability to defer maintenance to when operators are on-call; protection from spontaneous vendor-side outages during business hours.
Vendor side: legitimate access window for operations that would otherwise require per-customer coordination at scale; documented license to restart / migrate / patch; contractual cover for routine-operation side-effects.

Load-bearing properties¶

Contract, not request. The customer sets the window; the vendor honours it. Outside the window, routine maintenance does not happen.
Customer-local time. Windows are expressed in the customer's timezone, not UTC, to align with local business hours.
Revisable. Customer can change the window at any time via the management console / API.
Bounded disruption semantics. The vendor specifies what a "maintenance event" can do — reboot, brief read-only period, rolling restart, version upgrade — so the customer can size retries / health checks / failover accordingly.

Emergency override¶

The contract breaks down for urgent security patches — if the vendor must patch within hours to days of detection for a serious vulnerability, waiting up to a week for the customer's window to come around is infeasible.

Three override shapes exist:

Silent override. Vendor patches when it needs to, regardless of window. Customer learns after the fact. Breaks trust in the window contract.
Pre-notification override. Vendor notifies customer in advance (hours to a day) that the window will be overridden for an urgent patch, gives the customer time to adjust workload, then patches on the vendor's schedule. Preserves the courtesy contract; adds an emergency escape hatch.
Customer-opt-in-to-emergency. Customer declares up-front that urgent patches may override the window (typical default).

Canonical pre-notification-override realisation: MongoDB CVE-2025-14847 (2025-12-17 21:00 → 2025-12-18): MongoDB proactively notified Atlas customers with configured maintenance windows that an urgent patch would land the next day. "We proactively notified Atlas customers with maintenance windows configured that we would perform an urgent patch the following day, as part of our established policy." Maintenance-window customers were patched in the same ~6-day window as the rest of the Atlas fleet, without silent override.

Relationship to fleet-patching¶

Maintenance windows are a brake on fleet patching velocity — without an override path, a fleet-wide patch cannot land faster than the slowest customer's window. Vendors resolve this at architecture time by declaring a policy:

"Routine maintenance lands in your configured window."
"Emergency security patches may override your window with ≥N hours of pre-notification."

The policy is part of the managed-service contract; MongoDB's 2025-12-30 post names theirs as "our established policy."

Non-Atlas realisations¶

AWS RDS / Aurora — weekly 30-minute window; vendor-controlled patch application; override for "mandatory" security patches with customer notification.
AWS EKS — cluster-version upgrades + node-group maintenance windows; EKS Auto Mode shifts node lifecycle entirely to AWS and surfaces customer-side disruption controls (PDBs + Node Disruption Budgets + maintenance windows) as the retained responsibility.
Cloudflare — customer-facing cache / config rolls out through the rulesets engine and global config system; edge-binary rollout uses staged progression rather than customer windows because the edge is shared.

The Atlas case is notable for being per-instance (each cluster its own window) rather than per-region or per-tier, making the customer-side control granular.

Internal-service-to-service variant (Meta)¶

Maintenance windows also appear inside a hyperscaler as a service-to-service contract, not only as a customer-to-vendor contract. From sources/2024-06-16-meta-maintaining-large-scale-ai-capacity-at-meta:

"For AI capacity, we have optimized domains that allow for different kinds of AI capacity, very strict SLOs, and a contract with services that allows them to avoid maintenance- train interruptions, if possible."

The shape is isomorphic: the fleet operator (Meta Production Engineering) is the "vendor"; the internal services running on the fleet (AI training teams, ranking teams, etc.) are the "customers". The contract lets critical online / recurring jobs negotiate train-avoidance, with the operator retaining an emergency-train override for urgent patches. Same three-shape override options apply (silent / pre-notification / opt-in).

This broadens the concept beyond managed-service billing boundaries — any internal platform at sufficient scale grows a maintenance-window contract layer between platform ops and consumer services.

Seen in¶

sources/2025-12-30-mongodb-server-security-update-december-2025 — Atlas maintenance-window customers honoured with ~15-hour pre-notification override for urgent CVE-2025-14847 patch. Canonical customer-to-vendor wiki instance.
sources/2024-06-16-meta-maintaining-large-scale-ai-capacity-at-meta — internal AI services negotiate train-avoidance contracts with Meta's Production Engineering; emergency trains exist as the override channel. Canonical internal-service-to-service wiki instance.

concepts/fleet-patching — the capability maintenance windows brake.
concepts/maintenance-domain — the unit that drains within the window (Meta internal variant).
concepts/shared-responsibility-model — the contract layer maintenance windows sit in.
concepts/coordinated-disclosure — disclosure timeline must account for the fleet-wide patching-plus-window duration.
patterns/rapid-fleet-patching-via-managed-service — the rollout pattern that negotiates emergency overrides with maintenance-window customers (managed-database variant).
patterns/maintenance-train — the rollout pattern the internal-services variant runs on (Meta variant).
patterns/pre-disclosure-patch-rollout — the disclosure posture that implicitly uses the emergency-override path.
patterns/staged-rollout — the staging pattern within a maintenance window.
systems/opsplanner — Meta's orchestrator honouring the internal-service avoid-interruption contracts.