Skip to content

PATTERN Cited by 1 source

VFS Range GET from object storage

Pattern

Run a database engine "hot off an object storage URL" by intercepting its I/O layer with a VFS / filesystem extension the application links, and resolving each page read via an HTTP Range GET against a remote file in S3-compatible object storage. A small in-process LRU cache fronts the range reads so the hot set doesn't round-trip on every query.

Three layers compose:

SQLite library (unmodified)
    │  Read(page N, offset, size)
VFS / extension (this pattern's layer)
    │  1. Look up page N in local page-index (pre-built from
    │     remote file trailers)
    │  2. If page in LRU cache → return
    │  3. Else HTTP Range GET (remote_file, byte_offset, size)
    │     → populate LRU → return
Object storage (S3 / Tigris / GCS / Azure Blob)
    │  Range: bytes=<offset>-<offset+size-1>
    Returns the bytes of exactly one SQLite page

Canonical instance: Litestream VFS

From the 2025-12-11 shipping post:

"With our VFS loaded, whenever SQLite needs to read a page into memory, it issues a Read() call through our library. The read call includes the byte offset at which SQLite expected to find the page. But with Litestream VFS, that byte offset is an illusion. Instead, we use our knowledge of the page size along with the requested page number to do a lookup on the page index we've built. From it, we get the remote filename, the 'real' byte offset into that file, and the size of the page. That's enough for us to use the S3 API's Range header handling to download exactly the block we want." (Source: sources/2025-12-11-flyio-litestream-vfs)

Litestream VFS realises the pattern against LTX files, using the LTX index trailer (~1% of each LTX file) to pre-build the page-lookup table at cold-open.

Why it works

Three underlying primitives make this cheap:

  1. SQLite VFS — the application's unmodified SQLite library routes all OS-level I/O through a pluggable interface. Intercepting reads is a supported extension point, not a hack.
  2. Random-accessible file format — the backing file format (LTX in Litestream's case) ships with per-page compression + an end-of-file index trailer that maps page-number → byte-offset. Any single page can be located without scanning the file.
  3. Object-storage Range GET — modern S3-compatible stores (including S3, Tigris, GCS, Azure Blob) honour the HTTP Range request header for byte-granular reads. Cost scales with bytes actually read, not file size.

LRU cache exploits the hot-set shape

Page-level reads are only cheap if most reads miss the cache at a low enough rate to keep S3 costs bounded. The post names the SQLite-specific hot-set shape this pattern leverages:

"Most databases have a small set of 'hot' pages — inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly."

For B-tree databases with bounded working sets (canonical SQLite workload), a modest LRU cache absorbs the traffic that would otherwise become Range GETs.

Pairs with: L0 polling for near-realtime freshness

This pattern gives a read-side primitive — the application can query a remote database without downloading it. To keep reads fresh, pair with patterns/near-realtime-replica-via-l0-polling: the VFS polls the writer's finest-grained upload level (L0, uploaded every second in Litestream's case) and incrementally updates its page index. Together: pay for reads (Range GET on miss) + keep freshness via poll-and-update.

Trade-offs

  • Remote reads are not free. "this approach isn't as efficient as a local SQLite database. That kind of efficiency, where you don't even need to think about N+1 queries because there's no network round-trip, is part of the point of using SQLite." SQLite workloads that assume zero network cost (N+1-heavy query patterns, PRAGMA journal_mode=WAL-style write bursts expected to land locally) can degrade pathologically.
  • Cache miss = HTTP round-trip. p99 per-page read latency is object-storage RTT-bound; unfavourable against in-process page cache.
  • LRU tuning matters. Cache size too small → thrashing; too large → memory overhead defeats the "ephemeral server" premise.
  • File-format precondition is specific. Random-access over remote files requires a file format designed for it. Raw SQLite files aren't random-access at page granularity over HTTP; LTX is because of its per-page compression + EOF index trailer.

When it's the wrong shape

  • Write-heavy workloads. The pattern is read-side-only; writes still need the primary (local SQLite + Litestream or LiteFS on the writer side). Applications dominated by writes get no benefit.
  • Chatty query patterns. ORM-style SELECT * FROM a; for each a: SELECT * FROM b WHERE a_id = … walks hundreds of small pages; each cache miss is an HTTP round-trip. Batch-shape queries fare much better.
  • Strong read-after-write on the same session. If the reader is the writer too (unusual but possible), local-SQLite is strictly better; the VFS introduces unnecessary latency.
  • When FUSE is available and fine. If the deployment surface allows FUSE (bare-metal Linux, K8s with privileged: true), a FUSE-mounted LiteFS replica can give lower-latency reads for a small per-node cost — worth evaluating vs a VFS-range-GET reader.

Seen in

  • sources/2025-12-11-flyio-litestream-vfs — canonical wiki instance. Litestream VFS ships the composite pattern: .load litestream.so + file:///my.db?vfs=litestream + LTX-trailer- based page index + LRU cache of hot pages + HTTP Range GET on miss.
Last updated · 200 distilled / 1,178 read