Skip to content

PATTERN Cited by 1 source

Manifest via Raft for fast failover

Pattern

On each LSM memtable flush, write the updated manifest to durable remote storage (object storage) AND replicate it to all Raft-group followers. This ensures a newly elected leader already has the complete manifest locally and can serve reads immediately without fetching from remote storage.

When to use

  • An LSM-based store runs with Raft-replicated state for HA.
  • Failover latency is important (leader election should not block on remote manifest fetch).
  • The manifest also needs to be readable by external systems (bootstrap from object storage).

Mechanism

  1. Memtable flush triggers SSTable write to object storage.
  2. Updated manifest (listing all live SSTables + key ranges) is written to object storage.
  3. A Raft entry carrying the manifest is committed; followers apply it locally.
  4. The same Raft entry marks a log truncation point — WAL entries before this manifest are trimmed.
  5. On leader failure, new leader already has the manifest and resumes without remote fetch.

Trade-offs

  • Pro: Sub-second failover for metadata reads.
  • Pro: Followers are always ready to serve; no cold-start manifest download.
  • Pro: Object-storage copy enables external bootstrapping (DR, read replicas).
  • Con: Manifest replication adds per-flush Raft overhead.
  • Con: All replicas must have enough local storage for the manifest (typically small).

(Source: sources/2026-06-09-redpanda-cloud-topics-the-metastore)

Seen in

Last updated · 542 distilled / 1,571 read