PATTERN Cited by 1 source
Manifest via Raft for fast failover¶
Pattern¶
On each LSM memtable flush, write the updated manifest to durable remote storage (object storage) AND replicate it to all Raft-group followers. This ensures a newly elected leader already has the complete manifest locally and can serve reads immediately without fetching from remote storage.
When to use¶
- An LSM-based store runs with Raft-replicated state for HA.
- Failover latency is important (leader election should not block on remote manifest fetch).
- The manifest also needs to be readable by external systems (bootstrap from object storage).
Mechanism¶
- Memtable flush triggers SSTable write to object storage.
- Updated manifest (listing all live SSTables + key ranges) is written to object storage.
- A Raft entry carrying the manifest is committed; followers apply it locally.
- The same Raft entry marks a log truncation point — WAL entries before this manifest are trimmed.
- On leader failure, new leader already has the manifest and resumes without remote fetch.
Trade-offs¶
- Pro: Sub-second failover for metadata reads.
- Pro: Followers are always ready to serve; no cold-start manifest download.
- Pro: Object-storage copy enables external bootstrapping (DR, read replicas).
- Con: Manifest replication adds per-flush Raft overhead.
- Con: All replicas must have enough local storage for the manifest (typically small).
(Source: sources/2026-06-09-redpanda-cloud-topics-the-metastore)
Seen in¶
- systems/redpanda-cloud-topics-metastore — manifest flushed to S3 + replicated via Raft to all followers