SYSTEM Cited by 6 sources

Litestream¶

Litestream (litestream.io) is a streaming point-in-time-replication tool for SQLite. Litestream watches a SQLite database's WAL and ships WAL frames to object storage (S3, GCS, Azure Blob, etc.). Recovery is a litestream restore that reconstructs the database at any timestamp within the retention window. Like LiteFS, it works with unmodified SQLite libraries.

2025-05-20 revamp: LTX, CASAAS, VFS replicas¶

On 2025-05-20 Ben Johnson published the largest redesign of Litestream since its 2020 launch, folding three key ideas from LiteFS into Litestream proper. Source: sources/2025-05-20-flyio-litestream-revamped.

Original design being replaced¶

The 2020 design was shadow-WAL-based (see concepts/shadow-wal): Litestream opens a long-lived read transaction against the SQLite database, arresting WAL checkpointing; copies raw WAL frames to a staging "shadow WAL"; and uploads them to object storage. Simple, application- transparent — but restore cost scales with raw WAL volume:

"When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes."

LTX replaces raw WAL shipping¶

The revamp adopts the LTX file format — sorted, transaction-aware page-range changesets — from LiteFS. Because LTX files are sortable, adjacent time windows can be k-way-merged into a single file retaining only the latest version of each page (see patterns/ltx-compaction):

"This process of combining smaller time ranges into larger ones is called compaction. With it, we can replay a SQLite database to a specific point in time, with a minimal duplicate pages."

Restore to any PITR target now costs the size of the compacted state at the target, not the cumulative WAL volume since the last snapshot. Structurally, Litestream converges with LiteFS — LTX is the wire format on both sides of the pipeline.

CASAAS — Compare-and-Swap as a Service¶

The pre-revamp design used the concept of "generations" to recover from replication desync (new server starts, Litestream restart, etc.). Each generation is a snapshot + WAL stream; any break creates a new one. Managing multiple generations made read-replica and failover features hard.

The fix: constrain the destination to one active writer via a time-based lease on the object store. S3 and Tigris both ship conditional-write support as of 2024-11; Litestream uses conditional writes to implement the lease — no Consul, no etcd, no external coordination service:

"Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency."

Operational consequence: "you can run Litestream with ephemeral nodes, with overlapping run times, and even if they're storing to the same destination, they won't confuse each other." Rolling deploys, Fly-Machine restarts, and blue/green cutovers become trivially safe. This is the load-bearing architectural move that retires the generations abstraction.

VFS-based lightweight read replicas (FUSE-free)¶

Litestream was originally a write-side tool. The revamp adds a read-replica layer — a SQLite Virtual Filesystem extension the application links in, which fetches and caches pages directly from object storage on read:

"We're building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage."

The VFS surface avoids FUSE entirely — a key usability advantage over LiteFS ("installing and running a whole filesystem (even a fake one) is a lot to ask of users"). Works in environments where FUSE isn't available (in-browser WASM, many restricted FaaS). Explicit trade named in the post: "this approach isn't as efficient as a local SQLite database"; caching + prefetching are the performance knobs the revamp relies on.

Wildcard / directory replication¶

LTX's per-database cheapness plus CASAAS's coordination-free posture unlock a previously-infeasible feature:

"In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. … Now that we've switched to LTX, this isn't a problem any more. It should thus be possible to replicate /data/*.db, even if there's hundreds or thousands of databases in that directory."

One Litestream process can now replicate a full directory tree of SQLite databases (per-tenant DBs, per-project DBs, etc.).

Agent-storage framing¶

Closing positioning:

"We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what coding agents like Phoenix.new want is a way to try out code on live data, screw it up, and then rollback both the code and the state. These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks."

Ties to Fly.io's RX framing and stateful incremental VM build story — the revamp makes Litestream a plausible PITR + fork primitive for agentic coding platforms.

2025-10-02 shipping post: v0.5.0¶

On 2025-10-02 Ben Johnson published the shipping-announcement post for Litestream v0.5.0 — "the first batch of those changes are now 'shipping'" (Source: sources/2025-10-02-flyio-litestream-v050-is-here). The 2025-05-20 design post was forward-looking ("we're building"); this post enumerates what actually landed.

Three-level hierarchical compaction ladder¶

The LTX compaction pattern now has a concrete production instantiation:

"at Level 1, we compact all the changes in a 30-second time window; at Level 2, all the Level 1 files in a 5-minute window; at Level 3, all the Level 2's over an hour. Net result: we can restore a SQLite database to any point in time, using only a dozen or so files on average."

Compaction runs inside Litestream (not SQLite) — "Performance is limited only by I/O throughput." Restore cost is bounded to "a dozen or so files on average" regardless of retention depth.

Generations retired; monotonic TXID replaces them¶

"LTX-backed Litestream does away with the concept entirely. Instead, when we detect a break in WAL file continuity, we re-snapshot with the next LTX file. Now we have a monotonically incrementing transaction ID. We can use it to look up database state at any point in time, without searching across generations."

User-visible CLI: references to "transaction IDs" (TXID) replace the old generation/index/offset tuple. litestream wal is renamed to litestream ltx.

LTX library upgrade: per-page compression + EOF index¶

"It used to be an LTX file was just a sorted list of pages, all compressed together. Now we compress per-page, and keep an index at the end of the LTX file to pluck individual pages out. … we can build features that query from any point in time, without downloading the whole database."

The structural precondition for the still-unreleased VFS-based read-replica layer: fetch specific pages from a large LTX file without downloading the whole file.

One replica destination per database (now enforced)¶

"You only get a single replica destination per database. … Multiple replicas can diverge and are sensitive to network availability. Conflict resolution is brain surgery."

Follows directly from CASAAS — multiple active destinations doesn't compose with object-store coordination.

File-format break from v0.3.x; rollback preserved¶

"The new version of Litestream can't restore from old v0.3.x WAL segment files. That's OK though! The upgrade process is simple: just start using the new version. It'll leave your old WAL files intact, in case you ever need to revert to the older version. The new LTX files are stored cleanly in an ltx directory on your replica. The configuration file is fully backwards compatible."

Upgrade is a cutover, not a migration.

CGO eliminated; modernc.org/sqlite wins¶

"CGO is now gone. We've settled the age-old contest between mattn/go-sqlite3 and modernc.org/sqlite in favor of modernc.org. … it lets the cross-compiler work."

GOOS=linux GOARCH=amd64 go build from a Mac now Just Works.

NATS JetStream added as a replica type¶

"We've also added a replica type for NATS JetStream. Users that already have JetStream running can get Litestream going without adding an object storage dependency."

JetStream's persistence + at-least-once guarantees cover the same semantic surface as object-store conditional writes. First wiki instance of a NATS-JetStream-as-Litestream-replica configuration; contrasts with the core-NATS retirement datapoints on systems/nats.

Cloud-SDK client bumps¶

"We've upgraded all our clients (S3, Google Storage, & Azure Blob Storage) to their latest versions. We've also moved our code to support newer S3 APIs."

Implicit reference to the 2024-11 S3 conditional-writes feature CASAAS depends on.

Still not shipped: VFS-based read replicas¶

"We already have a proof of concept working and we're excited to show it off when it's ready!"

The read-replica layer teased in 2025-05-20 did not ship in v0.5.0 — v0.5.0 ships the write/archive side of the revamp plus the format changes that make read-replicas feasible.

2025-12-11 shipping: Litestream VFS¶

On 2025-12-11 Ben Johnson published the ship announcement for Litestream VFS — the SQLite VFS extension teased in 2025-05-20 and explicitly flagged as "still proof-of-concept, not shipped" in the 2025-10-02 v0.5.0 post. Source: sources/2025-12-11-flyio-litestream-vfs.

Activation¶

Standard SQLite loadable-extension mechanism:

sqlite> .load litestream.so
sqlite> .open file:///my.db?vfs=litestream

No modification to the SQLite library the application already links — "It's just a plugin for the SQLite you're already using."

What the VFS overrides¶

Only the read side:

"We override only the few methods we care about. Litestream VFS handles only the read side of SQLite. Litestream itself, running as a normal Unix program, still handles the 'write' side. So our VFS subclasses just enough to find LTX backups and issue queries."

Writes continue to flow through the regular Litestream primary.

Page lookup via LTX index trailer¶

The VFS discards SQLite's "local file" byte offset and uses the page number to look up the page's location in a database-wide index built from LTX index trailers:

"LTX trailers include a small index tracking the offset of each page in the file. By fetching only these index trailers from the LTX files we're working with (each occupies about 1% of its LTX file), we can build a lookup table of every page in the database."

~1% of each LTX file is the retrieval-cost datum for the page index.

Range GET against object storage¶

Once (filename, byte_offset, size) is known, the VFS issues an HTTP Range GET against S3 / Tigris / GCS / Azure Blob:

"That's enough for us to use the S3 API's Range header handling to download exactly the block we want."

Canonical instance of patterns/vfs-range-get-from-object-store.

LRU cache of hot pages¶

"To save lots of S3 calls, Litestream VFS implements an LRU cache. Most databases have a small set of 'hot' pages — inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly."

SQLite's B-tree hot-set shape (inner branches + leftmost leaves for AUTOINCREMENT tables) has a high LRU-value ratio; a modest cache absorbs most reads.

Near-realtime replica via L0 polling¶

"Because Litestream backs up (into the L0 layer) once per second, the VFS code can simply poll the S3 path, and then incrementally update its index. The result is a near-realtime replica. Better still, you don't need to stream the whole database back to your machine before you use it."

Canonical instance of patterns/near-realtime-replica-via-l0-polling. The L0 level of the compaction ladder (1 file / second, retained until L1) is the polling target.

L0 compaction-ladder disclosure¶

The 2025-12-11 post also refines the compaction-ladder disclosure with an explicit L0 entry on top of the 2025-10-02 v0.5.0 L1/L2/L3 = 30s/5m/1h ladder:

"By default, Litestream uses time intervals of 1 hour at the highest level, down to 30 seconds at level 1. L0 is a special level where files are uploaded every second, but are only retained until being compacted to L1."

Above L3, daily full snapshots. The ladder is therefore:

Level	Cadence	Retention
Snapshots	daily full	full retention
L3	1-hour windows	full retention
L2	5-minute windows	until compacted to L3
L1	30-second windows	until compacted to L2
L0	1-second uploads	until compacted to L1 (seconds)

PITR as a `PRAGMA`¶

"sqlite> PRAGMA litestream_time = '5 minutes ago'; sqlite> select * from sandwich_ratings ORDER BY RANDOM() LIMIT 3; 30|Meatball|Los Angeles|5 33|Ham & Swiss|Los Angeles|2 163|Chicken Shawarma Wrap|Detroit|5 We're now querying that database from a specific point in time in our backups. We can do arbitrary relative timestamps, or absolute ones, like 2000-01-01T00:00:00Z."

Canonical instance of concepts/pragma-based-pitr. PITR is now a two-line SQL operation on a live connection (no restore job, no CLI); the VFS redirects reads to the LTX state at the chosen timestamp.

Worked disaster-recovery example the post shows: missing WHERE on UPDATE sandwich_ratings SET stars = 1 in prod; on dev, PRAGMA litestream_time = '5 minutes ago' restores the view to the pre-disaster state.

Fast startup for ephemeral servers¶

"It starts up really fast! We're living an age of increasingly ephemeral servers, what with the AIs and the agents and the clouds and the hoyvin-glavins. Wherever you find yourself, if your database is backed up to object storage with Litestream, you're always in a place where you can quickly issue a query."

Cold-open path: open connection → fetch ~1% index trailers for relevant LTX files → build page index → serve. No full-database download; agentic / per-session consumers can query the database the moment their VM boots.

Read-side primitive; opt-in¶

"You don't have to use our VFS library to use Litestream, or to get the other benefits of the new LTX code."

Litestream-without-VFS is still the 2025-10-02 v0.5.0 system (LTX + compaction + CASAAS + NATS-JetStream-replica). The VFS is an additive read-side capability; not required, not replacing anything.

2026-01-29 update: writable VFS + hydration¶

On 2026-01-29 Ben Johnson published a follow-up to the 2025-12-11 ship announcement, adding two new opt-in modes to Litestream VFS driven by the Fly.io Sprites storage stack. Source: sources/2026-01-29-flyio-litestream-writable-vfs.

Writable VFS (`LITESTREAM_WRITE_ENABLED=true`)¶

The VFS, previously read-side-only, gains a write path. Writes land in a local temporary write buffer; every ~1 second (and on clean shutdown) the buffer syncs to object storage. L0 polling is disabled in write mode (single-writer assumption: no remote writer to observe).

Durability class: concepts/eventual-durability — writes are not "truly durable" until sync. Explicitly scoped to match Sprite eventual-durability envelope; explicitly not multi-writer:

"multiple-writer distributed SQLite databases are the Lament Configuration and we are not explorers over great vistas of pain."

Canonical instance of patterns/writable-vfs-with-buffered-sync.

Background hydration (`LITESTREAM_HYDRATION_PATH=...`)¶

The VFS serves queries via Range GETs immediately, while a background loop hydrates the whole database to a local file using LTX compaction ("only the latest versions of each page"). Reads cut over to the local file once hydration is complete. Prior-art lineage: dm-clone.

Hydration file is temp-file-discarded-on-exit: "we can't trust that the database is using the latest state every time we start up, not without doing a full restore, so we just chuck the hydration file when we exit the VFS." Canonical instance of patterns/background-hydration-to-local-file.

Production consumer: Sprites block map¶

Litestream "is built directly into the disk storage stack that runs on every Sprite" — the JuiceFS-fork block-map metadata backend, "low tens of megabytes worst case", must serve writes milliseconds after Sprite cold-boot. Writable VFS + hydration together meet that budget where litestream restore could not. The post also mentions a second Sprites Litestream deployment — the global orchestrator's per-org SQLite DBs synchronized by Litestream ("unlike our flagship Fly Machines product, which relies on a centralized Postgres cluster") — but calls it "boring" and doesn't deep-dive.

Scope discipline¶

"features are narrowly scoped for problems that look like the ones our storage stack needs" — both features are opt-in, additive, and eventual-durability-only. The sidecar-mode Litestream-plus-local-SQLite pattern remains the recommended shape for ordinary read/write workloads.

Fly.io's `tkdb` use¶

"tkdb is about 5000 lines of Go code that manages a SQLite database that is in turn managed by LiteFS and Litestream. … A full PITR recovery of the database takes just seconds." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)

The tkdb deployment uses Litestream for durability + disaster recovery, complementing LiteFS's availability + replica reads:

LiteFS → node-level replication (US→EU→AU, subsecond).
Litestream → WAL shipping to object storage, PITR on demand.
SQLite → file format + query surface.

Database size is "a couple dozen megs", so PITR restore from object storage completes in seconds — the closing Fly.io quote: "a total victory for LiteFS, Litestream, and infrastructure SQLite."

Design shape¶

WAL-based. SQLite's write-ahead log is the primary replication source; Litestream ships WAL frames at configurable cadence.
Streaming. New frames are uploaded continuously (not snapshotted at a daily cadence), so RPO is seconds-to- minutes.
Any-point-restore. Any timestamp within retention is a valid restore target.
Single-writer assumption. Litestream doesn't coordinate writers; it assumes SQLite's own single-writer semantics (extended across nodes by LiteFS in Fly's case).

Why it pairs with LiteFS, not replaces it¶

LiteFS gives you low-lag, live, read-serving replicas for availability and read-scaling. Litestream gives you a durable, timestamped archive you can rewind to when something bad happens (corruption, accidental delete, bad schema migration, rogue insert). The two solve different problems; Fly.io runs both simultaneously on tkdb because it's the token authority and neither availability-loss nor durability-loss is acceptable.

Canonical pairing on the wiki: see patterns/sqlite-plus-litefs-plus-litestream.

2026-02-04 follow-up: writable VFS and hydration¶

sources/2026-02-04-flyio-litestream-writable-vfs ships two more capabilities on top of the 2025-12-11 Litestream VFS — covered in detail on systems/litestream-vfs. Headline summary:

LITESTREAM_WRITE_ENABLED=true makes the VFS optionally read-write. Single-writer only ("multiple-writer distributed SQLite databases are the Lament Configuration"); polling disabled; writes buffered locally and synced to object storage every ~1 s or on clean shutdown; durability matches Sprites' "eventual durability" property.
LITESTREAM_HYDRATION_PATH=/path/to/file adds a background-populate-local-file loop while serving remote reads — the shape the post explicitly names from dm-clone. Reads don't block on hydration; the VFS switches to the local file when ready; the file is discarded on VFS exit.
Canonical production consumer: the Fly Sprites "block map". Sprite storage = fork of JuiceFS with the metadata tier rewritten on SQLite + Litestream VFS; block maps are "maybe low tens of megabytes worst case" and must be reconstituted within an HTTP-request timing budget on cold boot. Writable VFS + hydration cover the steady-state path.
The Sprites global orchestrator is also Litestream-backed — one SQLite database per enrolled organization, replacing the centralized-Postgres shape Fly Machines still uses.

Both features are env-var-gated and explicitly narrow-fit: "for ordinary read/write workloads, you don't need any of this mechanism" and "these features are narrowly scoped for problems that look like the ones our storage stack needs."

Seen in¶

[[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — Sprites' metadata-DB durability substrate, on two layers: (1) each Sprite's storage-stack metadata DB (hacked-up JuiceFS with a rewritten SQLite metadata backend — "that metadata store is kept durable with Litestream"); (2) the global orchestrator's per-account metadata DB — "an Elixir/Phoenix app that uses object storage as the primary source of metadata for accounts. We then give each account an independent SQLite database, again made durable on object storage with Litestream." Multi-tenant extension of the wildcard- replication story from the 2025-05-20 revamp. Two distinct deployment shapes in one product.
sources/2025-03-27-flyio-operationalizing-macaroons — canonical wiki instance; Litestream as tkdb's PITR substrate. "A full PITR recovery of the database takes just seconds."
sources/2025-05-20-flyio-litestream-revamped — architectural-redesign entry. Ben Johnson's 2025-05-20 retrospective on the biggest Litestream redesign since 2020: (1) LTX file format replaces raw-WAL shipping; (2) LTX compaction gives cheap PITR (restore cost proportional to distinct pages touched, not WAL volume); (3) CASAAS — Compare-and-Swap as a Service — uses object-store conditional writes for the single-writer lease (no Consul, no etcd), retiring the "generations" abstraction; (4) SQLite-VFS-based read replicas fetch pages directly from Tigris / S3 without FUSE; (5) wildcard / directory replication (/data/*.db) of hundreds or thousands of databases now viable. Closing thesis positions Litestream as a PITR + rollback + fork primitive for agentic coding platforms.
sources/2025-10-02-flyio-litestream-v050-is-here — shipping-announcement entry (v0.5.0). The design post shipped substantially as announced, with four concrete implementation-level disclosures: (1) hierarchical compaction ladder 30-second / 5-minute / 1-hour (Levels 1–3); restore bounded to "a dozen or so files on average"; (2) monotonic TXID replaces the generation/index/offset tuple (litestream wal → litestream ltx); (3) per-page compression + end-of-file index in the LTX library (the precondition for page-granular random access from S3 that makes VFS read replicas feasible); (4) NATS JetStream replica type added alongside S3 / GCS / Azure. Plus CGO removal via modernc.org/sqlite (cross-compile-from-Mac now works), one-replica-per-database enforced as a new hard constraint, file-format break from v0.3.x (cutover — old WAL files preserved for rollback), and confirmation that VFS read-replicas are still proof-of-concept not shipped.
sources/2025-12-11-flyio-litestream-vfs — VFS ship announcement. The proof-of-concept flagged in the 2025-10-02 v0.5.0 post is now shipping as Litestream VFS — a SQLite loadable-extension (.load litestream.so + file:///my.db?vfs=litestream) that overrides only the read side of SQLite's I/O interface. Page lookup via LTX index trailers (~1% of each LTX file); page reads via HTTP Range GET against S3-compatible storage; LRU cache of hot B-tree pages; near-realtime replica behaviour via L0 polling (L0 = 1-file-per-second upload cadence, retained until L1 compaction); SQL-level PITR via PRAGMA litestream_time = '<timestamp>'; (relative or absolute). Canonical wiki instances of patterns/vfs-range-get-from-object-store, patterns/near-realtime-replica-via-l0-polling, and concepts/pragma-based-pitr. Opt-in, additive, doesn't replace the rest of Litestream.
sources/2026-01-29-flyio-litestream-writable-vfs — writable-VFS + hydration ship post. Adds two new capabilities to the 2025-12-11 VFS, both env-var-gated and deliberately narrow-fit for Fly Sprites' storage stack: (1) write mode (LITESTREAM_WRITE_ENABLED=true) — single-writer only, polling disabled, writes go to a local buffer and sync to object storage every ~1 s or on clean shutdown; durability matches Sprites' "eventual durability" rather than strict sync; Johnson explicitly calls out multi-writer distributed SQLite as "the Lament Configuration"; (2) hydration (LITESTREAM_HYDRATION_PATH=/path/to/file) — shape shoplifted from dm-clone; the VFS serves reads from object storage while a background loop pulls the whole database to a local temp file (using LTX compaction to write only the latest version of each page); reads switch over to the local file once hydration completes; file is discarded on VFS exit. Canonical production consumer disclosed: the Sprite block map — Fly.io's JuiceFS fork uses Litestream-VFS-over-SQLite as the metadata backend, with an HTTP-request-timely boot budget as the gating latency constraint. Sprite orchestrator is also Litestream-backed (one SQLite DB per organization), replacing Fly Machines' centralized-Postgres shape. Canonical wiki instances of patterns/writable-vfs-with-buffered-sync, patterns/background-hydration-to-local-file, concepts/eventual-durability, and concepts/single-writer-assumption.

systems/sqlite — the database it backs up.
systems/litestream-vfs — the read-side VFS extension shipped 2025-12-11; realises the read-replica layer teased since 2025-05-20.
systems/litefs — the companion replication system; post- revamp they share the LTX on-wire format.
systems/tkdb — the canonical Fly.io consumer.
systems/tigris / systems/aws-s3 — conditional-write- supporting object-store backends CASAAS uses.
systems/nats — NATS JetStream added as a replica type in v0.5.0 alongside S3 / GCS / Azure.
concepts/ltx-file-format — the new wire / on-disk format.
concepts/ltx-index-trailer — the ~1%-of-file EOF index the VFS reads first on cold-open.
concepts/pragma-based-pitr — the SQL-level PITR surface (PRAGMA litestream_time = '<timestamp>';).
concepts/sqlite-vfs — the read-replica integration surface.
concepts/shadow-wal — the legacy replication mechanism being retired.
patterns/ltx-compaction — the compaction pattern LTX enables.
patterns/vfs-range-get-from-object-store — the composite page-level-read pattern shipped 2025-12-11.
patterns/near-realtime-replica-via-l0-polling — the freshness pattern the VFS layers on top.
patterns/conditional-write-lease — CASAAS.
patterns/sqlite-plus-litefs-plus-litestream — the three-layer pattern (now architecturally convergent on LTX).
patterns/metadata-plus-chunk-storage-stack — the composite Sprite-block-map stack shape (metadata tier = Litestream-VFS-over-SQLite; chunk tier = object-store).
systems/fly-sprites — canonical writable-VFS + hydration consumer; block map = Litestream-backed SQLite.
systems/juicefs — the filesystem stack whose metadata backend the Sprite block map is a rewrite of.
systems/dm-clone — the hydration-design ancestor explicitly named in the 2026-02-04 post.
concepts/async-clone-hydration — the hydration design applied at database-granularity.
companies/flyio.