PATTERN Cited by 1 source
VFS Range GET from object storage¶
Pattern¶
Run a database engine "hot off an object storage URL" by intercepting its I/O layer with a VFS / filesystem extension the application links, and resolving each page read via an HTTP Range GET against a remote file in S3-compatible object storage. A small in-process LRU cache fronts the range reads so the hot set doesn't round-trip on every query.
Three layers compose:
SQLite library (unmodified)
│ Read(page N, offset, size)
▼
VFS / extension (this pattern's layer)
│ 1. Look up page N in local page-index (pre-built from
│ remote file trailers)
│ 2. If page in LRU cache → return
│ 3. Else HTTP Range GET (remote_file, byte_offset, size)
│ → populate LRU → return
▼
Object storage (S3 / Tigris / GCS / Azure Blob)
│ Range: bytes=<offset>-<offset+size-1>
▼
Returns the bytes of exactly one SQLite page
Canonical instance: Litestream VFS¶
From the 2025-12-11 shipping post:
"With our VFS loaded, whenever SQLite needs to read a page into memory, it issues a
Read()call through our library. The read call includes the byte offset at which SQLite expected to find the page. But with Litestream VFS, that byte offset is an illusion. Instead, we use our knowledge of the page size along with the requested page number to do a lookup on the page index we've built. From it, we get the remote filename, the 'real' byte offset into that file, and the size of the page. That's enough for us to use the S3 API'sRangeheader handling to download exactly the block we want." (Source: sources/2025-12-11-flyio-litestream-vfs)
Litestream VFS realises the pattern against LTX files, using the LTX index trailer (~1% of each LTX file) to pre-build the page-lookup table at cold-open.
Why it works¶
Three underlying primitives make this cheap:
- SQLite VFS — the application's unmodified SQLite library routes all OS-level I/O through a pluggable interface. Intercepting reads is a supported extension point, not a hack.
- Random-accessible file format — the backing file format (LTX in Litestream's case) ships with per-page compression + an end-of-file index trailer that maps page-number → byte-offset. Any single page can be located without scanning the file.
- Object-storage Range GET — modern S3-compatible stores
(including S3, Tigris, GCS, Azure Blob)
honour the HTTP
Rangerequest header for byte-granular reads. Cost scales with bytes actually read, not file size.
LRU cache exploits the hot-set shape¶
Page-level reads are only cheap if most reads miss the cache at a low enough rate to keep S3 costs bounded. The post names the SQLite-specific hot-set shape this pattern leverages:
"Most databases have a small set of 'hot' pages — inner branch pages or the leftmost leaf pages for tables with an auto-incrementing ID field. So only a small percentage of the database is updated and queried regularly."
For B-tree databases with bounded working sets (canonical SQLite workload), a modest LRU cache absorbs the traffic that would otherwise become Range GETs.
Pairs with: L0 polling for near-realtime freshness¶
This pattern gives a read-side primitive — the application can query a remote database without downloading it. To keep reads fresh, pair with patterns/near-realtime-replica-via-l0-polling: the VFS polls the writer's finest-grained upload level (L0, uploaded every second in Litestream's case) and incrementally updates its page index. Together: pay for reads (Range GET on miss) + keep freshness via poll-and-update.
Trade-offs¶
- Remote reads are not free. "this approach isn't as efficient
as a local SQLite database. That kind of efficiency, where you
don't even need to think about N+1 queries because there's no
network round-trip, is part of the point of using SQLite." SQLite
workloads that assume zero network cost (N+1-heavy query patterns,
PRAGMA journal_mode=WAL-style write bursts expected to land locally) can degrade pathologically. - Cache miss = HTTP round-trip. p99 per-page read latency is object-storage RTT-bound; unfavourable against in-process page cache.
- LRU tuning matters. Cache size too small → thrashing; too large → memory overhead defeats the "ephemeral server" premise.
- File-format precondition is specific. Random-access over remote files requires a file format designed for it. Raw SQLite files aren't random-access at page granularity over HTTP; LTX is because of its per-page compression + EOF index trailer.
When it's the wrong shape¶
- Write-heavy workloads. The pattern is read-side-only; writes still need the primary (local SQLite + Litestream or LiteFS on the writer side). Applications dominated by writes get no benefit.
- Chatty query patterns. ORM-style
SELECT * FROM a; for each a: SELECT * FROM b WHERE a_id = …walks hundreds of small pages; each cache miss is an HTTP round-trip. Batch-shape queries fare much better. - Strong read-after-write on the same session. If the reader is the writer too (unusual but possible), local-SQLite is strictly better; the VFS introduces unnecessary latency.
- When FUSE is available and fine. If the deployment surface
allows FUSE (bare-metal Linux, K8s with
privileged: true), a FUSE-mounted LiteFS replica can give lower-latency reads for a small per-node cost — worth evaluating vs a VFS-range-GET reader.
Seen in¶
- sources/2025-12-11-flyio-litestream-vfs — canonical wiki
instance. Litestream VFS ships the composite pattern:
.load litestream.so+file:///my.db?vfs=litestream+ LTX-trailer- based page index + LRU cache of hot pages + HTTP Range GET on miss.
Related¶
- concepts/sqlite-vfs — the SQLite integration surface.
- concepts/ltx-file-format — the file format the pattern requires (random-access-capable).
- concepts/ltx-index-trailer — the pre-built index the VFS consults.
- systems/litestream — the writer that emits the LTX files this pattern reads from.
- systems/litestream-vfs — the canonical reader/consumer.
- systems/aws-s3 — the
Rangeheader primitive. - systems/sqlite — the substrate.
- patterns/ltx-compaction — the compaction ladder the writer runs.
- patterns/near-realtime-replica-via-l0-polling — the freshness pattern that pairs with this one.
- patterns/sqlite-plus-litefs-plus-litestream — the three-layer stack the pattern extends to the read side.