PATTERN Cited by 1 source

Manifest via Raft for fast failover¶

Pattern¶

On each LSM memtable flush, write the updated manifest to durable remote storage (object storage) AND replicate it to all Raft-group followers. This ensures a newly elected leader already has the complete manifest locally and can serve reads immediately without fetching from remote storage.

When to use¶

An LSM-based store runs with Raft-replicated state for HA.
Failover latency is important (leader election should not block on remote manifest fetch).
The manifest also needs to be readable by external systems (bootstrap from object storage).

Mechanism¶

Memtable flush triggers SSTable write to object storage.
Updated manifest (listing all live SSTables + key ranges) is written to object storage.
A Raft entry carrying the manifest is committed; followers apply it locally.
The same Raft entry marks a log truncation point — WAL entries before this manifest are trimmed.
On leader failure, new leader already has the manifest and resumes without remote fetch.

Trade-offs¶

Pro: Sub-second failover for metadata reads.
Pro: Followers are always ready to serve; no cold-start manifest download.
Pro: Object-storage copy enables external bootstrapping (DR, read replicas).
Con: Manifest replication adds per-flush Raft overhead.
Con: All replicas must have enough local storage for the manifest (typically small).

(Source: sources/2026-06-09-redpanda-cloud-topics-the-metastore)

Seen in¶

systems/redpanda-cloud-topics-metastore — manifest flushed to S3 + replicated via Raft to all followers