SYSTEM Cited by 4 sources
Litestream¶
Litestream (litestream.io) is a
streaming point-in-time-replication tool for SQLite.
Litestream watches a SQLite database's WAL and ships WAL frames
to object storage (S3, GCS, Azure Blob, etc.). Recovery is a
litestream restore that reconstructs the database at any
timestamp within the retention window. Like
LiteFS, it works with unmodified SQLite
libraries.
2025-05-20 revamp: LTX, CASAAS, VFS replicas¶
On 2025-05-20 Ben Johnson published the largest redesign of Litestream since its 2020 launch, folding three key ideas from LiteFS into Litestream proper. Source: sources/2025-05-20-flyio-litestream-revamped.
Original design being replaced¶
The 2020 design was shadow-WAL-based (see concepts/shadow-wal): Litestream opens a long-lived read transaction against the SQLite database, arresting WAL checkpointing; copies raw WAL frames to a staging "shadow WAL"; and uploads them to object storage. Simple, application- transparent — but restore cost scales with raw WAL volume:
"When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes."
LTX replaces raw WAL shipping¶
The revamp adopts the LTX file format — sorted, transaction-aware page-range changesets — from LiteFS. Because LTX files are sortable, adjacent time windows can be k-way-merged into a single file retaining only the latest version of each page (see patterns/ltx-compaction):
"This process of combining smaller time ranges into larger ones is called compaction. With it, we can replay a SQLite database to a specific point in time, with a minimal duplicate pages."
Restore to any PITR target now costs the size of the compacted state at the target, not the cumulative WAL volume since the last snapshot. Structurally, Litestream converges with LiteFS — LTX is the wire format on both sides of the pipeline.
CASAAS — Compare-and-Swap as a Service¶
The pre-revamp design used the concept of "generations" to recover from replication desync (new server starts, Litestream restart, etc.). Each generation is a snapshot + WAL stream; any break creates a new one. Managing multiple generations made read-replica and failover features hard.
The fix: constrain the destination to one active writer via a time-based lease on the object store. S3 and Tigris both ship conditional-write support as of 2024-11; Litestream uses conditional writes to implement the lease — no Consul, no etcd, no external coordination service:
"Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency."
Operational consequence: "you can run Litestream with ephemeral nodes, with overlapping run times, and even if they're storing to the same destination, they won't confuse each other." Rolling deploys, Fly-Machine restarts, and blue/green cutovers become trivially safe. This is the load-bearing architectural move that retires the generations abstraction.
VFS-based lightweight read replicas (FUSE-free)¶
Litestream was originally a write-side tool. The revamp adds a read-replica layer — a SQLite Virtual Filesystem extension the application links in, which fetches and caches pages directly from object storage on read:
"We're building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage."
The VFS surface avoids FUSE entirely — a key usability advantage over LiteFS ("installing and running a whole filesystem (even a fake one) is a lot to ask of users"). Works in environments where FUSE isn't available (in-browser WASM, many restricted FaaS). Explicit trade named in the post: "this approach isn't as efficient as a local SQLite database"; caching + prefetching are the performance knobs the revamp relies on.
Wildcard / directory replication¶
LTX's per-database cheapness plus CASAAS's coordination-free posture unlock a previously-infeasible feature:
"In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. … Now that we've switched to LTX, this isn't a problem any more. It should thus be possible to replicate
/data/*.db, even if there's hundreds or thousands of databases in that directory."
One Litestream process can now replicate a full directory tree of SQLite databases (per-tenant DBs, per-project DBs, etc.).
Agent-storage framing¶
Closing positioning:
"We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what coding agents like Phoenix.new want is a way to try out code on live data, screw it up, and then rollback both the code and the state. These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks."
Ties to Fly.io's RX framing and stateful incremental VM build story — the revamp makes Litestream a plausible PITR + fork primitive for agentic coding platforms.
2025-10-02 shipping post: v0.5.0¶
On 2025-10-02 Ben Johnson published the shipping-announcement post for Litestream v0.5.0 — "the first batch of those changes are now 'shipping'" (Source: sources/2025-10-02-flyio-litestream-v050-is-here). The 2025-05-20 design post was forward-looking ("we're building"); this post enumerates what actually landed.
Three-level hierarchical compaction ladder¶
The LTX compaction pattern now has a concrete production instantiation:
"at Level 1, we compact all the changes in a 30-second time window; at Level 2, all the Level 1 files in a 5-minute window; at Level 3, all the Level 2's over an hour. Net result: we can restore a SQLite database to any point in time, using only a dozen or so files on average."
Compaction runs inside Litestream (not SQLite) — "Performance is limited only by I/O throughput." Restore cost is bounded to "a dozen or so files on average" regardless of retention depth.
Generations retired; monotonic TXID replaces them¶
"LTX-backed Litestream does away with the concept entirely. Instead, when we detect a break in WAL file continuity, we re-snapshot with the next LTX file. Now we have a monotonically incrementing transaction ID. We can use it to look up database state at any point in time, without searching across generations."
User-visible CLI: references to "transaction IDs" (TXID)
replace the old generation/index/offset tuple.
litestream wal is renamed to litestream ltx.
LTX library upgrade: per-page compression + EOF index¶
"It used to be an LTX file was just a sorted list of pages, all compressed together. Now we compress per-page, and keep an index at the end of the LTX file to pluck individual pages out. … we can build features that query from any point in time, without downloading the whole database."
The structural precondition for the still-unreleased VFS-based read-replica layer: fetch specific pages from a large LTX file without downloading the whole file.
One replica destination per database (now enforced)¶
"You only get a single replica destination per database. … Multiple replicas can diverge and are sensitive to network availability. Conflict resolution is brain surgery."
Follows directly from CASAAS — multiple active destinations doesn't compose with object-store coordination.
File-format break from v0.3.x; rollback preserved¶
"The new version of Litestream can't restore from old v0.3.x WAL segment files. That's OK though! The upgrade process is simple: just start using the new version. It'll leave your old WAL files intact, in case you ever need to revert to the older version. The new LTX files are stored cleanly in an
ltxdirectory on your replica. The configuration file is fully backwards compatible."
Upgrade is a cutover, not a migration.
CGO eliminated; modernc.org/sqlite wins¶
"CGO is now gone. We've settled the age-old contest between
mattn/go-sqlite3andmodernc.org/sqlitein favor ofmodernc.org. … it lets the cross-compiler work."
GOOS=linux GOARCH=amd64 go build from a Mac now Just Works.
NATS JetStream added as a replica type¶
"We've also added a replica type for NATS JetStream. Users that already have JetStream running can get Litestream going without adding an object storage dependency."
JetStream's persistence + at-least-once guarantees cover the same semantic surface as object-store conditional writes. First wiki instance of a NATS-JetStream-as-Litestream-replica configuration; contrasts with the core-NATS retirement datapoints on systems/nats.
Cloud-SDK client bumps¶
"We've upgraded all our clients (S3, Google Storage, & Azure Blob Storage) to their latest versions. We've also moved our code to support newer S3 APIs."
Implicit reference to the 2024-11 S3 conditional-writes feature CASAAS depends on.
Still not shipped: VFS-based read replicas¶
"We already have a proof of concept working and we're excited to show it off when it's ready!"
The read-replica layer teased in 2025-05-20 did not ship in v0.5.0 — v0.5.0 ships the write/archive side of the revamp plus the format changes that make read-replicas feasible.
2025-12-11 shipping: Litestream VFS¶
On 2025-12-11 Ben Johnson published the ship announcement for Litestream VFS — the SQLite VFS extension teased in 2025-05-20 and explicitly flagged as "still proof-of-concept, not shipped" in the 2025-10-02 v0.5.0 post. Source: sources/2025-12-11-flyio-litestream-vfs.
Activation¶
Standard SQLite loadable-extension mechanism:
No modification to the SQLite library the application already links — "It's just a plugin for the SQLite you're already using."
What the VFS overrides¶
Only the read side:
"We override only the few methods we care about. Litestream VFS handles only the read side of SQLite. Litestream itself, running as a normal Unix program, still handles the 'write' side. So our VFS subclasses just enough to find LTX backups and issue queries."
Writes continue to flow through the regular Litestream primary.
Page lookup via LTX index trailer¶
The VFS discards SQLite's "local file" byte offset and uses the page number to look up the page's location in a database-wide index built from LTX index trailers:
"LTX trailers include a small index tracking the offset of each page in the file. By fetching only these index trailers from the LTX files we're working with (each occupies about 1% of its LTX file), we can build a lookup table of every page in the database."
~1% of each LTX file is the retrieval-cost datum for the page index.
Range GET against object storage¶
Once (filename, byte_offset, size) is known, the VFS issues an
HTTP Range GET against S3 / Tigris / GCS /
Azure Blob:
"That's enough for us to use the S3 API's
Rangeheader handling to download exactly the block we want."
Canonical instance of patterns/vfs-range-get-from-object-store.
LRU cache of hot pages¶
"To save lots of S3 calls, Litestream VFS implements an LRU cache. Most databases have a small set of 'hot' pages — inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly."
SQLite's B-tree hot-set shape (inner branches + leftmost leaves for AUTOINCREMENT tables) has a high LRU-value ratio; a modest cache absorbs most reads.
Near-realtime replica via L0 polling¶
"Because Litestream backs up (into the L0 layer) once per second, the VFS code can simply poll the S3 path, and then incrementally update its index. The result is a near-realtime replica. Better still, you don't need to stream the whole database back to your machine before you use it."
Canonical instance of patterns/near-realtime-replica-via-l0-polling. The L0 level of the compaction ladder (1 file / second, retained until L1) is the polling target.
L0 compaction-ladder disclosure¶
The 2025-12-11 post also refines the compaction-ladder disclosure with an explicit L0 entry on top of the 2025-10-02 v0.5.0 L1/L2/L3 = 30s/5m/1h ladder:
"By default, Litestream uses time intervals of 1 hour at the highest level, down to 30 seconds at level 1. L0 is a special level where files are uploaded every second, but are only retained until being compacted to L1."
Above L3, daily full snapshots. The ladder is therefore:
| Level | Cadence | Retention |
|---|---|---|
| Snapshots | daily full | full retention |
| L3 | 1-hour windows | full retention |
| L2 | 5-minute windows | until compacted to L3 |
| L1 | 30-second windows | until compacted to L2 |
| L0 | 1-second uploads | until compacted to L1 (seconds) |
PITR as a PRAGMA¶
"sqlite> PRAGMA litestream_time = '5 minutes ago'; sqlite> select * from sandwich_ratings ORDER BY RANDOM() LIMIT 3; 30|Meatball|Los Angeles|5 33|Ham & Swiss|Los Angeles|2 163|Chicken Shawarma Wrap|Detroit|5 We're now querying that database from a specific point in time in our backups. We can do arbitrary relative timestamps, or absolute ones, like
2000-01-01T00:00:00Z."
Canonical instance of concepts/pragma-based-pitr. PITR is now a two-line SQL operation on a live connection (no restore job, no CLI); the VFS redirects reads to the LTX state at the chosen timestamp.
Worked disaster-recovery example the post shows: missing WHERE
on UPDATE sandwich_ratings SET stars = 1 in prod; on dev, PRAGMA
litestream_time = '5 minutes ago' restores the view to the
pre-disaster state.
Fast startup for ephemeral servers¶
"It starts up really fast! We're living an age of increasingly ephemeral servers, what with the AIs and the agents and the clouds and the hoyvin-glavins. Wherever you find yourself, if your database is backed up to object storage with Litestream, you're always in a place where you can quickly issue a query."
Cold-open path: open connection → fetch ~1% index trailers for relevant LTX files → build page index → serve. No full-database download; agentic / per-session consumers can query the database the moment their VM boots.
Read-side primitive; opt-in¶
"You don't have to use our VFS library to use Litestream, or to get the other benefits of the new LTX code."
Litestream-without-VFS is still the 2025-10-02 v0.5.0 system (LTX + compaction + CASAAS + NATS-JetStream-replica). The VFS is an additive read-side capability; not required, not replacing anything.
Fly.io's tkdb use¶
"
tkdbis about 5000 lines of Go code that manages a SQLite database that is in turn managed by LiteFS and Litestream. … A full PITR recovery of the database takes just seconds." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)
The tkdb deployment uses Litestream for durability +
disaster recovery, complementing LiteFS's availability +
replica reads:
- LiteFS → node-level replication (US→EU→AU, subsecond).
- Litestream → WAL shipping to object storage, PITR on demand.
- SQLite → file format + query surface.
Database size is "a couple dozen megs", so PITR restore from object storage completes in seconds — the closing Fly.io quote: "a total victory for LiteFS, Litestream, and infrastructure SQLite."
Design shape¶
- WAL-based. SQLite's write-ahead log is the primary replication source; Litestream ships WAL frames at configurable cadence.
- Streaming. New frames are uploaded continuously (not snapshotted at a daily cadence), so RPO is seconds-to- minutes.
- Any-point-restore. Any timestamp within retention is a valid restore target.
- Single-writer assumption. Litestream doesn't coordinate writers; it assumes SQLite's own single-writer semantics (extended across nodes by LiteFS in Fly's case).
Why it pairs with LiteFS, not replaces it¶
LiteFS gives you low-lag, live, read-serving replicas for
availability and read-scaling. Litestream gives you a
durable, timestamped archive you can rewind to when
something bad happens (corruption, accidental delete, bad
schema migration, rogue insert). The two solve different
problems; Fly.io runs both simultaneously on tkdb because
it's the token authority and neither availability-loss nor
durability-loss is acceptable.
Canonical pairing on the wiki: see patterns/sqlite-plus-litefs-plus-litestream.
Seen in¶
- sources/2025-03-27-flyio-operationalizing-macaroons —
canonical wiki instance; Litestream as
tkdb's PITR substrate. "A full PITR recovery of the database takes just seconds." - sources/2025-05-20-flyio-litestream-revamped —
architectural-redesign entry. Ben Johnson's 2025-05-20
retrospective on the biggest Litestream redesign since 2020:
(1) LTX file format replaces
raw-WAL shipping; (2) LTX
compaction gives cheap PITR (restore cost proportional to
distinct pages touched, not WAL volume); (3) CASAAS —
Compare-and-Swap as a Service — uses object-store
conditional writes for the single-writer lease (no Consul,
no etcd), retiring the "generations" abstraction; (4)
SQLite-VFS-based read replicas fetch
pages directly from Tigris / S3
without FUSE; (5) wildcard / directory replication
(
/data/*.db) of hundreds or thousands of databases now viable. Closing thesis positions Litestream as a PITR + rollback + fork primitive for agentic coding platforms. - sources/2025-10-02-flyio-litestream-v050-is-here —
shipping-announcement entry (v0.5.0). The design post
shipped substantially as announced, with four concrete
implementation-level disclosures: (1) hierarchical
compaction ladder 30-second / 5-minute / 1-hour (Levels
1–3); restore bounded to "a dozen or so files on average";
(2) monotonic TXID replaces the
generation/index/offsettuple (litestream wal→litestream ltx); (3) per-page compression + end-of-file index in the LTX library (the precondition for page-granular random access from S3 that makes VFS read replicas feasible); (4) NATS JetStream replica type added alongside S3 / GCS / Azure. Plus CGO removal viamodernc.org/sqlite(cross-compile-from-Mac now works), one-replica-per-database enforced as a new hard constraint, file-format break from v0.3.x (cutover — old WAL files preserved for rollback), and confirmation that VFS read-replicas are still proof-of-concept not shipped. - sources/2025-12-11-flyio-litestream-vfs — VFS ship
announcement. The proof-of-concept flagged in the 2025-10-02
v0.5.0 post is now shipping as
Litestream VFS — a SQLite
loadable-extension (
.load litestream.so+file:///my.db?vfs=litestream) that overrides only the read side of SQLite's I/O interface. Page lookup via LTX index trailers (~1% of each LTX file); page reads via HTTP Range GET against S3-compatible storage; LRU cache of hot B-tree pages; near-realtime replica behaviour via L0 polling (L0 = 1-file-per-second upload cadence, retained until L1 compaction); SQL-level PITR viaPRAGMA litestream_time = '<timestamp>';(relative or absolute). Canonical wiki instances of patterns/vfs-range-get-from-object-store, patterns/near-realtime-replica-via-l0-polling, and concepts/pragma-based-pitr. Opt-in, additive, doesn't replace the rest of Litestream.
Related¶
- systems/sqlite — the database it backs up.
- systems/litestream-vfs — the read-side VFS extension shipped 2025-12-11; realises the read-replica layer teased since 2025-05-20.
- systems/litefs — the companion replication system; post- revamp they share the LTX on-wire format.
- systems/tkdb — the canonical Fly.io consumer.
- systems/tigris / systems/aws-s3 — conditional-write- supporting object-store backends CASAAS uses.
- systems/nats — NATS JetStream added as a replica type in v0.5.0 alongside S3 / GCS / Azure.
- concepts/ltx-file-format — the new wire / on-disk format.
- concepts/ltx-index-trailer — the ~1%-of-file EOF index the VFS reads first on cold-open.
- concepts/pragma-based-pitr — the SQL-level PITR surface
(
PRAGMA litestream_time = '<timestamp>';). - concepts/sqlite-vfs — the read-replica integration surface.
- concepts/shadow-wal — the legacy replication mechanism being retired.
- patterns/ltx-compaction — the compaction pattern LTX enables.
- patterns/vfs-range-get-from-object-store — the composite page-level-read pattern shipped 2025-12-11.
- patterns/near-realtime-replica-via-l0-polling — the freshness pattern the VFS layers on top.
- patterns/conditional-write-lease — CASAAS.
- patterns/sqlite-plus-litefs-plus-litestream — the three-layer pattern (now architecturally convergent on LTX).
- companies/flyio.