SYSTEM Cited by 2 sources
Nomad (HashiCorp)¶
Nomad is HashiCorp's workload orchestrator — an alternative to Kubernetes for scheduling containers, VMs, and raw binaries across a cluster of workers. Fly.io's first-generation orchestrator was Nomad-based; flyd was "carved out of" that Nomad system in the 2022 post Carving the scheduler out of our orchestrator.
Role in Fly.io's history¶
The 2024-07-30 Making Machines Move post quotes Fly's drain runbook from the pre-storage, Nomad-era: "de-bird that edge server, tell Nomad to drain that worker, and go back to sleep." At 2020 scale, a stateless drain on Nomad took "just a handful of minutes."
The post frames the migration-capability rebuild as the second- biggest engineering lift Fly's team has taken on: "This is the biggest thing our team has done since we replaced Nomad with flyd." Nomad is the substrate the current stack is "since."
Seen in¶
- sources/2024-07-30-flyio-making-machines-move — Referenced
as the prior orchestrator whose
drainoperation for stateless apps is the baseline Fly'sclone-based stateful migration tries to recover. - sources/2022-12-02-highscalability-stuff-the-internet-says-on-scalability-for-december-2nd-2022 — Roblox's on-prem infrastructure is Nomad + Consul + Vault (the "HashiStack"): 18,000 servers, 170,000 containers, orchestrated by Nomad. The Oct-2021 73-hour outage was a Consul streaming-feature regression, not a Nomad bug, but the roundup is the first public confirmation of Nomad running at ~170K-container scale outside HashiCorp's own case studies. See systems/roblox-hashistack.
Related¶
- systems/flyd — The in-house replacement Fly built from Nomad.
- systems/roblox-hashistack — largest disclosed public Nomad deployment (Roblox on-prem).
- systems/consul
- companies/highscalability