Skip to content

SYSTEM Cited by 2 sources

Meta Twine

Twine is Meta's core container orchestrator — the system that schedules, allocates, and manages the lifecycle of all services across Meta's data center regions. It is the successor/evolution of Tupperware (Meta's earlier container/cluster management system).

Architecture

Twine's control plane comprises several critical services:

  • Scheduler — determines where and when containers run
  • Allocator — manages resource allocation across the fleet
  • Broker — message broker for inter-service communication (built on Delos)
  • Zelos — coordinator service

These control-plane services are themselves deployed by Twine, creating an inherent bootstrapping circular dependency during region-level cold starts.

Region-Wide Coordination

Twine uses unavailability events (UEs) as a region-wide asynchronous signaling mechanism to coordinate service shutdown and recovery during power events and other region-scale disruptions.

Bootstrapping and Recovery

During instantaneous power loss of an entire region, millions of services must start all at once and discover each other autonomously. Two load-bearing problems and their solutions:

  1. Circular dependencies — control-plane services that power Twine itself need Twine to start. Solved via Belljar CI/CD testing (prevention) + Twine Recovery Kit (Twrko) jumpstart capability (recovery). (Source: sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness)

  2. Boomerang effect — UE shutdown signals shut down the Twine control plane itself, orphaning services. Solved by allowing control-plane services to ignore power-related UE shutdown signals. (Source: sources/2026-06-03-meta-lights-out-systems-on-validating-instant-power-loss-readiness)

Seen in

Last updated · 542 distilled / 1,571 read