SYSTEM Cited by 2 sources
Meta Twine¶
Twine is Meta's core container orchestrator — the system that schedules, allocates, and manages the lifecycle of all services across Meta's data center regions. It is the successor/evolution of Tupperware (Meta's earlier container/cluster management system).
Architecture¶
Twine's control plane comprises several critical services:
- Scheduler — determines where and when containers run
- Allocator — manages resource allocation across the fleet
- Broker — message broker for inter-service communication (built on Delos)
- Zelos — coordinator service
These control-plane services are themselves deployed by Twine, creating an inherent bootstrapping circular dependency during region-level cold starts.
Region-Wide Coordination¶
Twine uses unavailability events (UEs) as a region-wide asynchronous signaling mechanism to coordinate service shutdown and recovery during power events and other region-scale disruptions.
Bootstrapping and Recovery¶
During instantaneous power loss of an entire region, millions of services must start all at once and discover each other autonomously. Two load-bearing problems and their solutions:
-
Circular dependencies — control-plane services that power Twine itself need Twine to start. Solved via Belljar CI/CD testing (prevention) + Twine Recovery Kit (Twrko) jumpstart capability (recovery). (Source: sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness)
-
Boomerang effect — UE shutdown signals shut down the Twine control plane itself, orphaning services. Solved by allowing control-plane services to ignore power-related UE shutdown signals. (Source: sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness)
Seen in¶
- sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness — central to region-bootstrap and PowerLoss Storm validation
- sources/2024-06-16-meta-maintaining-large-scale-ai-capacity-at-meta — referenced as the orchestrator managing Meta's GPU fleet maintenance