Skip to content

PATTERN Cited by 1 source

Incremental S3 backup of immutable files

Intent

Back up every state-durability checkpoint (commit, flush, segment-finalize) to S3 by uploading only the files that aren't already in the S3 prefix — rather than re-uploading the whole index or building a tar/archive. The combination of checkpoint-triggered backup + immutable-file storage + list-diff-then-upload produces a low-overhead, high-durability replication path where S3 becomes the source of truth for committed data.

When to use

  • The system has an on-disk representation in immutable files (Lucene segments, Parquet, LSM-tree sstables, log segments, ...).
  • The system has an explicit commit / flush boundary on the write path.
  • You want a cheap durability substrate without giving up commit cadence or adding archive-build overhead.

Shape

  on commit:
    local_files  = listdir(local_index_dir)
    remote_files = s3.list_objects(index_prefix)
    new_files    = local_files \ remote_files
    for each new_file in parallel:
      s3.put_object(new_file)
    ack commit

Per-commit cost scales with the size of new data, not total index size. For a search index with a steady ingestion rate and occasional merges, this is tiny (milliseconds) for steady state; merges can produce larger segments, pushing per-commit S3 upload cost up to tens of seconds — but the canonical Nrtsearch measurement pegs this range at "a few ms to 20 seconds depending on the size of the data."

Canonical instance: Yelp Nrtsearch 1.0.0

Yelp Nrtsearch 1.0.0 (2025-05-08) replaces periodic-full- backup-to-S3 with per-commit incremental upload:

"Lucene segments are immutable, so when we perform a backup, we only need to upload the new files since the last backup. On every commit, Nrtsearch checks the files in S3, determines the missing files, and uploads them. This makes a commit slightly slower, as we are now uploading files to S3 while we were only flushing them to EBS before. The additional time is generally a few ms to 20 seconds depending on the size of the data, which is trivial enough to not cause any issues." (Source: sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10)

Downstream consequences:

  • Primary disk becomes ephemeral. S3 holds every committed segment, so the primary can run on local SSD instead of EBS (concepts/ephemeral-local-disk-vs-ebs).
  • Replica bootstrap is from S3. Combined with parallel S3 download, replicas bootstrap 5× faster than the previous EBS path.
  • Full consistent snapshots can be infrequent, direct S3-to-S3 copies — they're no longer on the replica-bootstrap critical path.

Why not log shipping or archive snapshot?

  • vs. log shipping (WAL replication, change-data-capture, Kafka relay): log shipping works for mutable-file stores where on-disk state changes in place (row updates, page rewrites). For immutable-file stores, log shipping is more work, not less — the recipient still has to construct on-disk segments from the log. Uploading the already-built segments is strictly cheaper.
  • vs. archive-based periodic backup: archives have to be rebuilt from scratch each cycle (N² work for N segments on linear-scan archive), or use advanced diff-tooling. Simpler to just list the remote prefix once per commit.

Preconditions

  • Immutable on-disk files. InnoDB pages that are rewritten in place cannot use this pattern directly.
  • A cheap list-diff operation. S3 LIST is the canonical cheap option (paginated, O(prefix size) per listing, cacheable). Local filesystem listing is free.
  • An explicit commit boundary. Without a commit, there's no natural trigger.

Seen in

  • sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — canonical wiki instance. Every Lucene commit in Nrtsearch 1.0.0 incrementally uploads new segment files to S3, replacing the prior periodic-archive backup model. Primary disk moves from EBS to ephemeral local SSD as a consequence; replica bootstrap becomes 5× faster via parallel S3 download.
Last updated · 550 distilled / 1,221 read