SYSTEM Cited by 2 sources

Meta Twine¶

Twine is Meta's core container orchestrator — the system that schedules, allocates, and manages the lifecycle of all services across Meta's data center regions. It is the successor/evolution of Tupperware (Meta's earlier container/cluster management system).

Architecture¶

Twine's control plane comprises several critical services:

Scheduler — determines where and when containers run
Allocator — manages resource allocation across the fleet
Broker — message broker for inter-service communication (built on Delos)
Zelos — coordinator service

These control-plane services are themselves deployed by Twine, creating an inherent bootstrapping circular dependency during region-level cold starts.

Region-Wide Coordination¶

Twine uses unavailability events (UEs) as a region-wide asynchronous signaling mechanism to coordinate service shutdown and recovery during power events and other region-scale disruptions.

Bootstrapping and Recovery¶

During instantaneous power loss of an entire region, millions of services must start all at once and discover each other autonomously. Two load-bearing problems and their solutions:

Circular dependencies — control-plane services that power Twine itself need Twine to start. Solved via Belljar CI/CD testing (prevention) + Twine Recovery Kit (Twrko) jumpstart capability (recovery). (Source: sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness)
Boomerang effect — UE shutdown signals shut down the Twine control plane itself, orphaning services. Solved by allowing control-plane services to ignore power-related UE shutdown signals. (Source: sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness)

Seen in¶

sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness — central to region-bootstrap and PowerLoss Storm validation
sources/2024-06-16-meta-maintaining-large-scale-ai-capacity-at-meta — referenced as the orchestrator managing Meta's GPU fleet maintenance