Skip to content

SYSTEM Cited by 4 sources

Litestream

Litestream (litestream.io) is a streaming point-in-time-replication tool for SQLite. Litestream watches a SQLite database's WAL and ships WAL frames to object storage (S3, GCS, Azure Blob, etc.). Recovery is a litestream restore that reconstructs the database at any timestamp within the retention window. Like LiteFS, it works with unmodified SQLite libraries.

2025-05-20 revamp: LTX, CASAAS, VFS replicas

On 2025-05-20 Ben Johnson published the largest redesign of Litestream since its 2020 launch, folding three key ideas from LiteFS into Litestream proper. Source: sources/2025-05-20-flyio-litestream-revamped.

Original design being replaced

The 2020 design was shadow-WAL-based (see concepts/shadow-wal): Litestream opens a long-lived read transaction against the SQLite database, arresting WAL checkpointing; copies raw WAL frames to a staging "shadow WAL"; and uploads them to object storage. Simple, application- transparent — but restore cost scales with raw WAL volume:

"When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes."

LTX replaces raw WAL shipping

The revamp adopts the LTX file format — sorted, transaction-aware page-range changesets — from LiteFS. Because LTX files are sortable, adjacent time windows can be k-way-merged into a single file retaining only the latest version of each page (see patterns/ltx-compaction):

"This process of combining smaller time ranges into larger ones is called compaction. With it, we can replay a SQLite database to a specific point in time, with a minimal duplicate pages."

Restore to any PITR target now costs the size of the compacted state at the target, not the cumulative WAL volume since the last snapshot. Structurally, Litestream converges with LiteFS — LTX is the wire format on both sides of the pipeline.

CASAAS — Compare-and-Swap as a Service

The pre-revamp design used the concept of "generations" to recover from replication desync (new server starts, Litestream restart, etc.). Each generation is a snapshot + WAL stream; any break creates a new one. Managing multiple generations made read-replica and failover features hard.

The fix: constrain the destination to one active writer via a time-based lease on the object store. S3 and Tigris both ship conditional-write support as of 2024-11; Litestream uses conditional writes to implement the lease — no Consul, no etcd, no external coordination service:

"Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency."

Operational consequence: "you can run Litestream with ephemeral nodes, with overlapping run times, and even if they're storing to the same destination, they won't confuse each other." Rolling deploys, Fly-Machine restarts, and blue/green cutovers become trivially safe. This is the load-bearing architectural move that retires the generations abstraction.

VFS-based lightweight read replicas (FUSE-free)

Litestream was originally a write-side tool. The revamp adds a read-replica layer — a SQLite Virtual Filesystem extension the application links in, which fetches and caches pages directly from object storage on read:

"We're building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage."

The VFS surface avoids FUSE entirely — a key usability advantage over LiteFS ("installing and running a whole filesystem (even a fake one) is a lot to ask of users"). Works in environments where FUSE isn't available (in-browser WASM, many restricted FaaS). Explicit trade named in the post: "this approach isn't as efficient as a local SQLite database"; caching + prefetching are the performance knobs the revamp relies on.

Wildcard / directory replication

LTX's per-database cheapness plus CASAAS's coordination-free posture unlock a previously-infeasible feature:

"In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. … Now that we've switched to LTX, this isn't a problem any more. It should thus be possible to replicate /data/*.db, even if there's hundreds or thousands of databases in that directory."

One Litestream process can now replicate a full directory tree of SQLite databases (per-tenant DBs, per-project DBs, etc.).

Agent-storage framing

Closing positioning:

"We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what coding agents like Phoenix.new want is a way to try out code on live data, screw it up, and then rollback both the code and the state. These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks."

Ties to Fly.io's RX framing and stateful incremental VM build story — the revamp makes Litestream a plausible PITR + fork primitive for agentic coding platforms.

2025-10-02 shipping post: v0.5.0

On 2025-10-02 Ben Johnson published the shipping-announcement post for Litestream v0.5.0 — "the first batch of those changes are now 'shipping'" (Source: sources/2025-10-02-flyio-litestream-v050-is-here). The 2025-05-20 design post was forward-looking ("we're building"); this post enumerates what actually landed.

Three-level hierarchical compaction ladder

The LTX compaction pattern now has a concrete production instantiation:

"at Level 1, we compact all the changes in a 30-second time window; at Level 2, all the Level 1 files in a 5-minute window; at Level 3, all the Level 2's over an hour. Net result: we can restore a SQLite database to any point in time, using only a dozen or so files on average."

Compaction runs inside Litestream (not SQLite) — "Performance is limited only by I/O throughput." Restore cost is bounded to "a dozen or so files on average" regardless of retention depth.

Generations retired; monotonic TXID replaces them

"LTX-backed Litestream does away with the concept entirely. Instead, when we detect a break in WAL file continuity, we re-snapshot with the next LTX file. Now we have a monotonically incrementing transaction ID. We can use it to look up database state at any point in time, without searching across generations."

User-visible CLI: references to "transaction IDs" (TXID) replace the old generation/index/offset tuple. litestream wal is renamed to litestream ltx.

LTX library upgrade: per-page compression + EOF index

"It used to be an LTX file was just a sorted list of pages, all compressed together. Now we compress per-page, and keep an index at the end of the LTX file to pluck individual pages out. … we can build features that query from any point in time, without downloading the whole database."

The structural precondition for the still-unreleased VFS-based read-replica layer: fetch specific pages from a large LTX file without downloading the whole file.

One replica destination per database (now enforced)

"You only get a single replica destination per database. … Multiple replicas can diverge and are sensitive to network availability. Conflict resolution is brain surgery."

Follows directly from CASAAS — multiple active destinations doesn't compose with object-store coordination.

File-format break from v0.3.x; rollback preserved

"The new version of Litestream can't restore from old v0.3.x WAL segment files. That's OK though! The upgrade process is simple: just start using the new version. It'll leave your old WAL files intact, in case you ever need to revert to the older version. The new LTX files are stored cleanly in an ltx directory on your replica. The configuration file is fully backwards compatible."

Upgrade is a cutover, not a migration.

CGO eliminated; modernc.org/sqlite wins

"CGO is now gone. We've settled the age-old contest between mattn/go-sqlite3 and modernc.org/sqlite in favor of modernc.org. … it lets the cross-compiler work."

GOOS=linux GOARCH=amd64 go build from a Mac now Just Works.

NATS JetStream added as a replica type

"We've also added a replica type for NATS JetStream. Users that already have JetStream running can get Litestream going without adding an object storage dependency."

JetStream's persistence + at-least-once guarantees cover the same semantic surface as object-store conditional writes. First wiki instance of a NATS-JetStream-as-Litestream-replica configuration; contrasts with the core-NATS retirement datapoints on systems/nats.

Cloud-SDK client bumps

"We've upgraded all our clients (S3, Google Storage, & Azure Blob Storage) to their latest versions. We've also moved our code to support newer S3 APIs."

Implicit reference to the 2024-11 S3 conditional-writes feature CASAAS depends on.

Still not shipped: VFS-based read replicas

"We already have a proof of concept working and we're excited to show it off when it's ready!"

The read-replica layer teased in 2025-05-20 did not ship in v0.5.0 — v0.5.0 ships the write/archive side of the revamp plus the format changes that make read-replicas feasible.

2025-12-11 shipping: Litestream VFS

On 2025-12-11 Ben Johnson published the ship announcement for Litestream VFS — the SQLite VFS extension teased in 2025-05-20 and explicitly flagged as "still proof-of-concept, not shipped" in the 2025-10-02 v0.5.0 post. Source: sources/2025-12-11-flyio-litestream-vfs.

Activation

Standard SQLite loadable-extension mechanism:

sqlite> .load litestream.so
sqlite> .open file:///my.db?vfs=litestream

No modification to the SQLite library the application already links — "It's just a plugin for the SQLite you're already using."

What the VFS overrides

Only the read side:

"We override only the few methods we care about. Litestream VFS handles only the read side of SQLite. Litestream itself, running as a normal Unix program, still handles the 'write' side. So our VFS subclasses just enough to find LTX backups and issue queries."

Writes continue to flow through the regular Litestream primary.

Page lookup via LTX index trailer

The VFS discards SQLite's "local file" byte offset and uses the page number to look up the page's location in a database-wide index built from LTX index trailers:

"LTX trailers include a small index tracking the offset of each page in the file. By fetching only these index trailers from the LTX files we're working with (each occupies about 1% of its LTX file), we can build a lookup table of every page in the database."

~1% of each LTX file is the retrieval-cost datum for the page index.

Range GET against object storage

Once (filename, byte_offset, size) is known, the VFS issues an HTTP Range GET against S3 / Tigris / GCS / Azure Blob:

"That's enough for us to use the S3 API's Range header handling to download exactly the block we want."

Canonical instance of patterns/vfs-range-get-from-object-store.

LRU cache of hot pages

"To save lots of S3 calls, Litestream VFS implements an LRU cache. Most databases have a small set of 'hot' pages — inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly."

SQLite's B-tree hot-set shape (inner branches + leftmost leaves for AUTOINCREMENT tables) has a high LRU-value ratio; a modest cache absorbs most reads.

Near-realtime replica via L0 polling

"Because Litestream backs up (into the L0 layer) once per second, the VFS code can simply poll the S3 path, and then incrementally update its index. The result is a near-realtime replica. Better still, you don't need to stream the whole database back to your machine before you use it."

Canonical instance of patterns/near-realtime-replica-via-l0-polling. The L0 level of the compaction ladder (1 file / second, retained until L1) is the polling target.

L0 compaction-ladder disclosure

The 2025-12-11 post also refines the compaction-ladder disclosure with an explicit L0 entry on top of the 2025-10-02 v0.5.0 L1/L2/L3 = 30s/5m/1h ladder:

"By default, Litestream uses time intervals of 1 hour at the highest level, down to 30 seconds at level 1. L0 is a special level where files are uploaded every second, but are only retained until being compacted to L1."

Above L3, daily full snapshots. The ladder is therefore:

Level Cadence Retention
Snapshots daily full full retention
L3 1-hour windows full retention
L2 5-minute windows until compacted to L3
L1 30-second windows until compacted to L2
L0 1-second uploads until compacted to L1 (seconds)

PITR as a PRAGMA

"sqlite> PRAGMA litestream_time = '5 minutes ago'; sqlite> select * from sandwich_ratings ORDER BY RANDOM() LIMIT 3; 30|Meatball|Los Angeles|5 33|Ham & Swiss|Los Angeles|2 163|Chicken Shawarma Wrap|Detroit|5 We're now querying that database from a specific point in time in our backups. We can do arbitrary relative timestamps, or absolute ones, like 2000-01-01T00:00:00Z."

Canonical instance of concepts/pragma-based-pitr. PITR is now a two-line SQL operation on a live connection (no restore job, no CLI); the VFS redirects reads to the LTX state at the chosen timestamp.

Worked disaster-recovery example the post shows: missing WHERE on UPDATE sandwich_ratings SET stars = 1 in prod; on dev, PRAGMA litestream_time = '5 minutes ago' restores the view to the pre-disaster state.

Fast startup for ephemeral servers

"It starts up really fast! We're living an age of increasingly ephemeral servers, what with the AIs and the agents and the clouds and the hoyvin-glavins. Wherever you find yourself, if your database is backed up to object storage with Litestream, you're always in a place where you can quickly issue a query."

Cold-open path: open connection → fetch ~1% index trailers for relevant LTX files → build page index → serve. No full-database download; agentic / per-session consumers can query the database the moment their VM boots.

Read-side primitive; opt-in

"You don't have to use our VFS library to use Litestream, or to get the other benefits of the new LTX code."

Litestream-without-VFS is still the 2025-10-02 v0.5.0 system (LTX + compaction + CASAAS + NATS-JetStream-replica). The VFS is an additive read-side capability; not required, not replacing anything.

Fly.io's tkdb use

"tkdb is about 5000 lines of Go code that manages a SQLite database that is in turn managed by LiteFS and Litestream. … A full PITR recovery of the database takes just seconds." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)

The tkdb deployment uses Litestream for durability + disaster recovery, complementing LiteFS's availability + replica reads:

  • LiteFS → node-level replication (US→EU→AU, subsecond).
  • Litestream → WAL shipping to object storage, PITR on demand.
  • SQLite → file format + query surface.

Database size is "a couple dozen megs", so PITR restore from object storage completes in seconds — the closing Fly.io quote: "a total victory for LiteFS, Litestream, and infrastructure SQLite."

Design shape

  • WAL-based. SQLite's write-ahead log is the primary replication source; Litestream ships WAL frames at configurable cadence.
  • Streaming. New frames are uploaded continuously (not snapshotted at a daily cadence), so RPO is seconds-to- minutes.
  • Any-point-restore. Any timestamp within retention is a valid restore target.
  • Single-writer assumption. Litestream doesn't coordinate writers; it assumes SQLite's own single-writer semantics (extended across nodes by LiteFS in Fly's case).

Why it pairs with LiteFS, not replaces it

LiteFS gives you low-lag, live, read-serving replicas for availability and read-scaling. Litestream gives you a durable, timestamped archive you can rewind to when something bad happens (corruption, accidental delete, bad schema migration, rogue insert). The two solve different problems; Fly.io runs both simultaneously on tkdb because it's the token authority and neither availability-loss nor durability-loss is acceptable.

Canonical pairing on the wiki: see patterns/sqlite-plus-litefs-plus-litestream.

Seen in

  • sources/2025-03-27-flyio-operationalizing-macaroons — canonical wiki instance; Litestream as tkdb's PITR substrate. "A full PITR recovery of the database takes just seconds."
  • sources/2025-05-20-flyio-litestream-revampedarchitectural-redesign entry. Ben Johnson's 2025-05-20 retrospective on the biggest Litestream redesign since 2020: (1) LTX file format replaces raw-WAL shipping; (2) LTX compaction gives cheap PITR (restore cost proportional to distinct pages touched, not WAL volume); (3) CASAAS — Compare-and-Swap as a Service — uses object-store conditional writes for the single-writer lease (no Consul, no etcd), retiring the "generations" abstraction; (4) SQLite-VFS-based read replicas fetch pages directly from Tigris / S3 without FUSE; (5) wildcard / directory replication (/data/*.db) of hundreds or thousands of databases now viable. Closing thesis positions Litestream as a PITR + rollback + fork primitive for agentic coding platforms.
  • sources/2025-10-02-flyio-litestream-v050-is-hereshipping-announcement entry (v0.5.0). The design post shipped substantially as announced, with four concrete implementation-level disclosures: (1) hierarchical compaction ladder 30-second / 5-minute / 1-hour (Levels 1–3); restore bounded to "a dozen or so files on average"; (2) monotonic TXID replaces the generation/index/offset tuple (litestream wallitestream ltx); (3) per-page compression + end-of-file index in the LTX library (the precondition for page-granular random access from S3 that makes VFS read replicas feasible); (4) NATS JetStream replica type added alongside S3 / GCS / Azure. Plus CGO removal via modernc.org/sqlite (cross-compile-from-Mac now works), one-replica-per-database enforced as a new hard constraint, file-format break from v0.3.x (cutover — old WAL files preserved for rollback), and confirmation that VFS read-replicas are still proof-of-concept not shipped.
  • sources/2025-12-11-flyio-litestream-vfsVFS ship announcement. The proof-of-concept flagged in the 2025-10-02 v0.5.0 post is now shipping as Litestream VFS — a SQLite loadable-extension (.load litestream.so + file:///my.db?vfs=litestream) that overrides only the read side of SQLite's I/O interface. Page lookup via LTX index trailers (~1% of each LTX file); page reads via HTTP Range GET against S3-compatible storage; LRU cache of hot B-tree pages; near-realtime replica behaviour via L0 polling (L0 = 1-file-per-second upload cadence, retained until L1 compaction); SQL-level PITR via PRAGMA litestream_time = '<timestamp>'; (relative or absolute). Canonical wiki instances of patterns/vfs-range-get-from-object-store, patterns/near-realtime-replica-via-l0-polling, and concepts/pragma-based-pitr. Opt-in, additive, doesn't replace the rest of Litestream.
Last updated · 200 distilled / 1,178 read