CONCEPT Cited by 1 source
Ghost-node ejection¶
Definition¶
Ghost-node ejection is the automatic removal of stale node-membership references from a distributed cluster's internal state after a node has left the cluster but before the cluster has garbage-collected its metadata references to that node. The "ghost" is a node whose presence has ended in reality but persists as a dangling entry in the cluster's view of membership, quorum, or replica placement.
Without ghost-node ejection, the cluster's internal state drifts from ground truth — members that aren't there still count toward quorums, still appear in replica sets, still consume a slot in ops-console membership lists — and subsequent cluster operations (leader elections, replica placement decisions, rebalances) consult a model of the cluster that has stale nodes in it.
Canonical Redpanda framing¶
"Automatic ghost node ejection: Redpanda now automatically cleans up after 'ghost' nodes that have left the cluster, keeping your cluster state pristine."
One sentence of product disclosure. First wiki canonicalisation of the phenomenon.
The failure mode¶
Typical ghost-node creation paths:
- Ungraceful shutdown — a broker process crashes or is terminated without a clean decommission. Cluster metadata still lists the node as a member.
- Long network partition — a node is unreachable long enough to be considered "departed" by some observers but cluster metadata still references it.
- Hardware replacement — a failed node is replaced with a new hostname / identity; the old identity persists in state until manually reaped.
- Decommission step failure — a multi-step decommission workflow aborts mid-flight, leaving the node marked partially-removed.
Consequences of a lingering ghost node:
- Raft quorum math is off: a 5-node cluster where 2 are ghost-nodes reads as a 5-node cluster for quorum calculations but only has 3 participating — quorum can be lost on a single additional failure.
- Replica placement decisions consult a model that includes the ghost; partitions may be "assigned" to the ghost and never replicated.
- Operator ergonomics — dashboards, monitoring alerts, and
rpk cluster statusshow noise that doesn't correspond to real nodes; real incidents are masked. - Cleanup operational burden — operators run manual "remove-node" commands periodically to maintain hygiene.
Automatic vs manual cleanup¶
Pre-26.1, Redpanda operators had tools to manually eject ghost
nodes (rpk cluster decommission-node, rpk cluster health
inspection). The 26.1 change is automation of the detection
+ cleanup step: the cluster itself recognises a node has left
and prunes references without operator intervention.
This is the cluster-membership altitude analogue of concepts/explicit-teardown-on-completion at the process altitude and patterns/bad-host-auto-drain at the fleet altitude — a reliability primitive that observes end-of-life events and cleans up after them instead of relying on an external reaper.
Distinguishing from node decommission¶
| Axis | Decommission (planned) | Ghost-node ejection (unplanned) |
|---|---|---|
| Trigger | Operator runs decommission-node |
Node departure observed by cluster |
| Graceful | Yes — node finishes in-flight work | No — node already gone |
| Rebalance semantics | Partitions moved off before remove | Partitions re-assigned after detection |
| Operator action required | Yes — initiated by operator | No — fully automatic |
Decommission is the graceful happy-path; ghost-node ejection is the automatic catch-all for the unhappy paths.
Mechanism gaps (from the source)¶
The 26.1 launch post is one sentence of PR framing. Undisclosed:
- Detection mechanism — heartbeat-timeout-based? Gossip- convergence-based? Raft-membership-change-based?
- Timeout thresholds — how long must a node be unreachable before it's declared a ghost? Tunable?
- Interaction with Raft membership changes — does ghost-node ejection go through the Raft config-change protocol, or is it a metadata-only update?
- False-positive protection — what prevents a flapping network from marking a healthy node as a ghost?
- Interaction with partitioned minorities — in a split- brain, which side considers the other a ghost?
Seen in¶
- sources/2026-03-31-redpanda-261-delivers-the-industrys-first-adaptable-streaming-engine — Redpanda 26.1 launch post. First wiki disclosure of the ghost-node ejection primitive as a Redpanda feature. "Keeping your cluster state pristine."
Source¶
- Original: https://www.redpanda.com/blog/26-1-r1-cloud-topics
- Raw markdown:
raw/redpanda/2026-03-31-redpanda-261-delivers-the-industrys-first-adaptable-streamin-09255e05.md
Related¶
- systems/redpanda — the broker shipping automatic ghost- node ejection in 26.1.
- concepts/gossip-protocol — a typical detection substrate for node-liveness claims.
- concepts/explicit-teardown-on-completion — the process- altitude analogue of cluster-membership cleanup.
- patterns/automated-cluster-standup-decommission — the broader lifecycle pattern ghost-node ejection slots into.
- patterns/bad-host-auto-drain — fleet-altitude sibling pattern for automatic cleanup of failing members.