CONCEPT Cited by 1 source
Hot-reloadable configuration¶
Definition¶
Hot-reloadable configuration is a process's ability to pick up a new configuration value without a restart. The test: you change a setting (retry count, connection pool size, feature flag, routing rule) and in-flight work is unaffected; new work sees the new value.
At scale, it is a customer-impact lever: a restart-required reconfiguration burns every in-flight request or query on the restarted process.
Why it matters at Meta-Presto scale¶
sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale closes with this advice:
"Where possible, configurations should be made hot reloadable so that Presto instances do not have to be restarted or updated in a disruptive manner which will result in query failures and customer dissatisfaction."
A Presto coordinator / worker that has to be restarted to accept a new config throws away every running query on that node — many of which may have been running for minutes or hours. On a fleet of "tens of thousands of machines", restart-required reconfig is an operational tax and an SLA-breaking event each time it is paid.
Design shape¶
Hot-reload typically needs:
- A versioned config store (file, KV, or distribution service) that the process watches.
- A reloader that, on notification, validates and atomically swaps the active config.
- A reload-safe API surface inside the process — configuration values must be read at the point of use, not cached once at startup; or a cached copy must be invalidated / re-read on swap.
- Backward-compat tests for every config value: a new value must not break in-flight operations mid-flight.
Not all config is safely hot-reloadable — e.g. the shape of a thread pool may require careful drain/re-init. The Meta advice is explicit: "where possible."
Seen in¶
- sources/2023-07-16-highscalability-lessons-learned-running-presto-at-meta-scale — Meta names configuration management as a distinct scaling challenge and prescribes hot-reloadability as the mitigation.
Related¶
- concepts/blast-radius — restart-required reconfig raises blast radius per config change.
- concepts/customer-facing-sla — restart-required reconfig breaks the SLA, one cluster at a time.
- patterns/hot-reloadable-configuration — the implementation pattern.
- patterns/staged-rollout — complementary: even hot-reloadable config should be staged.