Skip to content

Fly.io — Litestream: Revamped

Summary

Ben Johnson's retrospective on the biggest redesign of Litestream since its 2020 launch. Litestream's original design — opening a long-lived read transaction on SQLite, arresting WAL checkpointing, building a "shadow WAL," and shipping raw WAL frames to S3 — worked but was slow on restore (every historical change replayed), and the "generation" abstraction for desync recovery blocked failover and read-replica features. The rewrite imports three ideas from LiteFS: (1) the LTX file format — sorted, transaction-aware page ranges that compact LSM-tree style into point-in-time restores with minimal duplicate pages; (2) CASAAS ("Compare-and-Swap as a Service") — a time-based replication lease implemented via object-store conditional writes (S3, Tigris), replacing LiteFS's Consul dependency and retiring the generations concept; (3) lightweight VFS-based read replicas that fetch and cache pages directly from object storage, without FUSE. Two secondary features fall out of LTX: wildcard / directory replication (/data/*.db across hundreds or thousands of databases, which the pre-LTX WAL-polling design made infeasible), and a positioning thesis for agentic coding platforms — PITR as a primitive on which code-and-state rollbacks and forks can be built.

Key takeaways

  1. The original Litestream design was simple but slow on restore. "When you want to restore a database, you have have to pull down and replay every change since the last snapshot. If you changed a single database page a thousand times, you replay a thousand changes." The "shadow WAL" trick (a long-lived read transaction that arrests SQLite WAL checkpointing so Litestream can copy WAL pages out before SQLite consolidates them back into the main file) bought Litestream its application-transparent property, but left restore cost proportional to raw WAL volume, not to the number of logical changes. (Source: sources/2025-05-20-flyio-litestream-revamped)

  2. LTX (LiteFS Transactions) files are sorted, transaction-aware page ranges — sortable and merge-compactable. "LiteFS is transaction-aware. It doesn't simply record raw WAL pages, but rather ordered ranges of pages associated with transactions, using a file format we call LTX. Each LTX file represents a sorted changeset of pages for a given period of time. Because they are sorted, we can easily merge multiple LTX files together and create a new LTX file with only the latest version of each page. This is similar to how an LSM tree works." Canonical concepts/ltx-file-format disclosure in the wiki corpus.

  3. Compaction of LTX files gives cheap PITR. "This process of combining smaller time ranges into larger ones is called compaction. With it, we can replay a SQLite database to a specific point in time, with a minimal duplicate pages." Same idea as LSM compaction but applied to SQLite-page changesets rather than key-value records; see patterns/ltx-compaction.

  4. CASAAS — "Compare-and-Swap as a Service" — uses object-store conditional writes for a time-based replication lease. "Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease. We get essentially the same constraint Consul gave us, but without having to think about it or set up a dependency." The lease guarantees single-writer to a given destination, which in turn collapses multiple "generations" into a single latest generation — simplifying read-replica and failover semantics. Canonical patterns/conditional-write-lease instance.

  5. LiteFS's FUSE dependency was the usability wall; VFS is the replacement for Litestream. "installing and running a whole filesystem (even a fake one) is a lot to ask of users." LiteFS already had an escape hatch — LiteVFS, a SQLite Virtual Filesystem extension loaded into the application instead of a FUSE mount (works in WASM, restricted FaaS, etc.). Revamped Litestream uses the same trick: "We're building a VFS-based read-replica layer. It will be able to fetch and cache pages directly from S3-compatible object storage."

  6. VFS replicas trade local-SQLite efficiency for cacheable object-storage reads. "this approach isn't as efficient as a local SQLite database. That kind of efficiency, where you don't even need to think about N+1 queries because there's no network round-trip to make the duplicative queries pile up costs, is part of the point of using SQLite. But we're optimistic that with caching and prefetching, the approach we'll yield, for the right use cases, strong performance — all while serving SQLite reads hot off of Tigris or S3." The post is explicit that the embedded-SQLite efficiency story weakens in the VFS path; the pitch is cacheable object storage, not local NVMe.

  7. LTX makes wildcard / directory replication viable at fleet scale. "In the old Litestream design, WAL-change polling and slow restores made it infeasible to replicate large numbers of databases from a single process. That has been our answer when users ask us for a 'wildcard' or 'directory' replication argument for the tool. Now that we've switched to LTX, this isn't a problem any more. It should thus be possible to replicate /data/*.db, even if there's hundreds or thousands of databases in that directory." Many-database workloads (e.g. one SQLite DB per tenant) move from "infeasible with Litestream" to first-class.

  8. Agentic coding platforms get PITR as a primitive for code+state rollback and forking. "We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too. We think what coding agents like Phoenix.new want is a way to try out code on live data, screw it up, and then rollback both the code and the state. These Litestream updates put us in a position to give agents PITR as a primitive. On top of that, you can build both rollbacks and forks." Ties into Fly.io's RX framing and stateful incremental VM build story — both 2025-04-08 companion posts.

Systems extracted

  • Litestream — the redesigned system. Ship date is not disclosed (the post is forward-looking: "we're building", "it should thus be possible"). [extends]
  • LiteFS — the companion system from which LTX, the VFS escape hatch (LiteVFS), and the single-leader approach (originally via Consul) are imported. [extends]
  • SQLite — the substrate. New wiki role disclosed: VFS as the Litestream-integration surface, not just the FUSE (LiteFS) or WAL (original Litestream) surface. [extends]
  • Tigris — explicitly named alongside S3 as a conditional-write-supporting object store backend for CASAAS. [extends]
  • S32024-11 conditional-write launch is the direct enabling event for CASAAS. Strong-consistency (2020) + conditional writes (2024) together unlock client-side distributed-lock patterns; this is a load-bearing consumer instance. [extends]

Concepts extracted

  • concepts/ltx-file-format — LiteFS Transactions file format. Sorted page-range changesets with per-transaction boundaries. Directly merge-compactable. NEW
  • concepts/sqlite-vfs — SQLite's Virtual Filesystem extension surface. Load an extension into the application instead of mounting a FUSE filesystem. NEW
  • concepts/shadow-wal — Litestream's original replication mechanism: a long-lived read transaction that arrests WAL checkpointing so Litestream can stage a copy of the WAL before SQLite consolidates it. Named as the legacy design being retired. NEW
  • concepts/lsm-compaction — LTX compaction is "similar to how an LSM tree works". New instance outside the traditional DB context — applied to SQLite-page-range files rather than KV records. [extends]
  • concepts/wal-write-ahead-logging — SQLite's WAL is the replication source for the original Litestream design (arrested via a long-lived reader); LTX is the transaction-aware replacement for raw WAL shipping. [extends]

Patterns extracted

  • patterns/ltx-compaction — merge sorted transaction-scoped page-range files into larger time-window files retaining only the latest page version per range. Same shape as LSM leveled compaction at the SQLite-page-range layer. NEW
  • patterns/conditional-write-lease — implement a time-based replication lease using object-storage conditional writes, eliminating external coordination services (Consul, etcd, Zookeeper). NEW
  • patterns/sqlite-plus-litefs-plus-litestream — pattern page extended: post-revamp, the LTX format is shared between LiteFS and Litestream, making the two tools architecturally convergent — LTX is the wire format on both sides. [extends]
  • patterns/conditional-write — second consumer instance of conditional writes (first = 2024 S3 announcement framed through Iceberg/Delta/Hudi snapshot pointers). [extends]

Operational numbers

  • The post is forward-looking ("we're building", "should be possible") — no production numbers disclosed.
  • Directory-replication target: "hundreds or thousands of databases" (/data/*.db).
  • No benchmarks vs. the pre-revamp design; no latency/throughput numbers; no adoption/ship-date.
  • 452 HN points, item 44045292.

Caveats

  • Forward-looking product post, not a shipping post-mortem. The VFS read-replica layer is "we're building", CASAAS is "we're solving the problem [this] way", and wildcard replication is "it should thus be possible". Treat as design disclosure, not operational retrospective.
  • No performance numbers. Neither the pre- vs. post-revamp restore time, nor VFS-replica read latency vs. local SQLite, nor compaction cost under load are quoted.
  • Conditional-write semantics details undisclosed. The post names the S3 feature by link but does not specify the lease TTL, clock-skew assumptions, or how the CASAAS primitive handles clock drift or partial failures.
  • LTX format details deferred to the LiteFS repo. Full page-range encoding, header layout, compaction trigger thresholds, and tombstone handling are not in this post — see superfly/ltx for the reference implementation.
  • VFS trade-off honestly named but not quantified. "this approach isn't as efficient as a local SQLite database" — the post does not give the efficiency ratio or qualify which workloads cross the threshold.
  • Agentic-coding framing is aspirational. "We have a sneaking suspicion that the robots that write LLM code are going to like SQLite too" — no deployment data or partner disclosures (beyond a phoenix.new link).
  • Single-writer assumption persists. CASAAS collapses to single-generation, but SQLite's fundamental single-writer ceiling is untouched; nothing in the post changes the SQLite + LiteFS + Litestream workload-precondition list.

Source

Last updated · 200 distilled / 1,178 read