PATTERN Cited by 2 sources
LTX compaction (time-window merge of SQLite page runs)¶
Pattern¶
Represent a database's changes as sorted per-transaction page-range files (LTX), then periodically k-way-merge adjacent time windows into larger files that keep only the latest version of each page. The output file is the point-in-time state of the database at the end of the window, encoded as a sorted page run.
Operationally, this is the LSM-tree compaction idea specialized for SQLite:
Fine-grained LTX files (recent transactions)
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│ tx1..│ │ tx3..│ │ tx7..│ │ tx9..│ (minutes of writes)
└──────┘ └──────┘ └──────┘ └──────┘
│ │ │ │
└───────┴────────┴────────┘
│
▼
┌─────────┐
│ merged │ (one file per hour; keeps only
│ tx1..9 │ the latest page version)
└─────────┘
│
▼
(daily window, etc.)
Why it works¶
Two properties of LTX make compaction cheap:
- Sorted by page number. Two LTX files can be merged with a straight k-way-merge streaming algorithm (see patterns/streaming-k-way-merge). No sort step required.
- Transaction-scoped. Merging two windows is safe because each input LTX file respects transaction boundaries — the merged file is equivalent to the database state at the end of the later window.
From Fly.io:
"This process of combining smaller time ranges into larger ones is called compaction. With it, we can replay a SQLite database to a specific point in time, with a minimal duplicate pages. This is similar to how an LSM tree works." (Source: sources/2025-05-20-flyio-litestream-revamped)
PITR cost structure¶
| Restore target | Pre-LTX (shadow WAL) | Post-LTX (compaction) |
|---|---|---|
| Restore to "now" | snapshot + replay all WAL frames since | download latest compacted LTX |
Restore to time t in the past |
snapshot + replay all WAL frames to t |
download snapshot LTX at t + few deltas |
| Effort scales with | WAL volume (frame count) | distinct pages touched (compacted size) |
The second row is where the pattern pays off: if a single page was rewritten a thousand times in the window, the shadow-WAL approach replays a thousand records; the LTX-compaction approach carries one page in the compacted output.
Canonical instance: Litestream revamp (2025-05-20)¶
Fly.io's 2025-05-20 Litestream revamp is the canonical wiki instance. Pre-revamp, Litestream shipped raw SQLite WAL frames to S3; restore cost scaled with WAL volume. Post-revamp, Litestream ships LTX files to object storage and runs a compaction process against them, converging on LiteFS's internal format. See systems/litestream for the system-level narrative and patterns/sqlite-plus-litefs-plus-litestream for the full stack.
Second consequence: compaction is cheap enough that
wildcard/directory replication becomes viable — replicating
/data/*.db with hundreds or thousands of SQLite databases from
a single Litestream process was "infeasible" on the shadow-WAL
design because WAL polling didn't amortise across databases; LTX
streams amortise trivially.
Shipping-level compaction ladder (v0.5.0, 2025-10-02)¶
The 2025-10-02 Litestream v0.5.0 shipping post (sources/2025-10-02-flyio-litestream-v050-is-here) gives the pattern its first concrete production instantiation — a three-level time-window compaction hierarchy:
| Level | Window | Input |
|---|---|---|
| L1 | 30 s | raw LTX files from live replication |
| L2 | 5 min | 10 × L1 files |
| L3 | 1 hour | 12 × L2 files |
From the post:
"at Level 1, we compact all the changes in a 30-second time window; at Level 2, all the Level 1 files in a 5-minute window; at Level 3, all the Level 2's over an hour. Net result: we can restore a SQLite database to any point in time, using only a dozen or so files on average. Litestream performs this compaction itself. It doesn't rely on SQLite to process the WAL file. Performance is limited only by I/O throughput."
Two invariants this ladder exposes:
- Restore cost is bounded
O(dozen)files — "a dozen or so files on average" — regardless of retention depth. Compare against shadow-WAL:O(WAL-frames-since-last-snapshot). - Compaction is a Litestream-side background job, not a SQLite-triggered checkpoint. Performance is IO-bound on object-storage bandwidth, not CPU-bound.
This makes the compaction ladder a specialisation of time-window compaction (LSM-style leveled compaction where level boundaries are time intervals rather than size thresholds) — the shape is reusable beyond SQLite/LTX for any append workload over a merge-compactable sorted-run format.
Trade-offs¶
- Storage cost during retention. A long PITR retention window keeps intermediate LTX files around; storage cost is roughly proportional to retention × write-rate × dedupability. Compaction reduces restore cost, not archive cost at full retention.
- Compaction trigger policy. The post doesn't disclose Litestream's compaction trigger (size thresholds, time thresholds, or both) — a design knob borrowed from any LSM-compaction literature.
- Read amplification during live restore. If a restore target falls between compaction boundaries, the restore process merges the most recent compacted state with un-compacted deltas — more bandwidth than the "exact-point" case.
When it's the wrong shape¶
- Streaming-read workloads over WAL CDC. Downstream consumers that need the full stream of writes (e.g. CDC into Kafka) want frame-level detail — compaction discards intermediate states. Use a separate CDC path (at the WAL layer) rather than consuming from compacted LTX.
- Tiny, append-mostly databases. If the database never overwrites pages, compaction saves nothing — the compacted form equals the concatenation. LTX still helps with transaction awareness, but not via compaction.
Seen in¶
- sources/2025-05-20-flyio-litestream-revamped — canonical wiki instance; "This is similar to how an LSM tree works."
- sources/2025-10-02-flyio-litestream-v050-is-here — shipping-level disclosure of the three-level 30s / 5m / 1h compaction ladder; restore-cost invariant "a dozen or so files on average".
Related¶
- concepts/ltx-file-format — the file format this pattern operates on.
- concepts/lsm-compaction — the parent algorithm class.
- systems/litefs — the original LTX consumer.
- systems/litestream — the post-revamp LTX compaction consumer.
- systems/sqlite — the substrate whose pages LTX encodes.
- patterns/sqlite-plus-litefs-plus-litestream — the three-layer stack.
- patterns/streaming-k-way-merge — the algorithmic substrate compaction runs on.