Skip to content

PATTERN Cited by 1 source

FSM RPC for config metadata transfer

Intent

When a cross-host operation (migration, replication, failover) depends on per-resource configuration that may differ between hosts — because of version skew, default drift, or local customisation — carry the source host's config as RPC metadata inside the orchestration state machine, so the target host reconstructs the resource with the source's parameters rather than the target's defaults.

When to use

  • You operate a heterogeneous fleet (see concepts/heterogeneous-fleet-config-skew).
  • A cross-host operation creates a derived resource whose shape must match the source's (encrypted volume, replicated table, migrated container).
  • The target host would otherwise use its own local defaults, which don't match the source's — silent breakage or immediate error.
  • You have an orchestrator with a durable state machine (FSM, workflow engine) that the operation already flows through.

Structure

  1. Define the set of config parameters that must travel with the resource — "what does the target need to know about how the source created this?"
  2. Extend the FSM's migration RPC with a metadata field carrying those parameters.
  3. On the source side, read the authoritative parameters from the resource itself (the on-disk LUKS2 header, the table schema, the container-spec) and populate the RPC.
  4. On the target side, use the received parameters when reconstructing the resource — not the local defaults.
  5. Log the decision so it's debuggable when the migration completes and a future cross-host operation needs the same parameters.

Canonical example

From Fly.io 2024-07-30:

Two different workers, for cursed reasons, might be running different versions of cryptsetup ... default to different LUKS2 header sizes — 4 MiB and 16 MiB. Implying two different plaintext volume sizes.

So now part of the migration FSM is an RPC call that carries metadata about the designed LUKS2 configuration for the target VM. Not something we expected to have to build, but, whatever.

flyd's migration FSM — one of its "durable finite state machines, each representing some operation on a Fly Machine (creation, start, stop, &c)" — gained a step that transfers the source's LUKS2 header config so the target creates the dm-clone device with the correct plaintext size.

Consequences

Upsides:

  • The operation succeeds despite fleet skew.
  • The orchestrator's audit trail records the config at migration time, useful for post-incident analysis.
  • Future config drift can be handled by extending the RPC without changing the data plane.

Downsides:

  • One more coupling between the orchestrator and the resource's implementation details. The FSM now has to know about LUKS2.
  • You only notice the need for this when a migration breaks. Expect to add these incrementally as skew surfaces.
  • Additive RPC growth. Every new skewed-config dimension adds a field.

Generalisations

The same shape surfaces outside migration:

  • Container orchestrators carry the image content-hash, runtime class, security-context defaults — all essentially RPC metadata that would otherwise differ per host.
  • Database replication carries table schema and per-column defaults.
  • Configuration management systems (Consul, etcd, flagd) ship resource-specific config as structured key-value.

The Fly.io instance is distinctive because the skew was unintended — cryptsetup's default-drift was not a design decision, just something that happened over time.

Seen in

Last updated · 200 distilled / 1,178 read