Skip to content

PATTERN Cited by 1 source

Block-device container image for lazy loading

Pattern

At image build time, convert the standard gzip-tarball OCI image into a block-device-based image format with fixed-size sectors (e.g. 4 MB). At pull time, the customised container runtime fetches only filesystem metadata (directory structure, file names, permissions), constructs a virtual block device, and mounts it into the container so the application can start running immediately. Actual block contents are fetched on first read through a callback to an image fetcher process that retrieves the block from the remote container registry; retrieved blocks are cached locally to avoid repeat round-trips.

Net effect: container start time goes from several minutes (full gzip pull + unpack) to a few seconds, with a small per-block first-read tail latency that warms quickly under realistic workloads.

(Source: sources/2026-05-08-databricks-how-superhuman-and-databricks-built-a-200k-qps-inference-platform-together)

Components

  • Build-time format converter. Re-emits the standard gzip image as a seekable block device organised in fixed-size sectors.
  • Customised container runtime. Recognises the block-device format and mounts it as a virtual device populated from a remote source.
  • Image fetcher process. Sits between the virtual block device and the container registry; handles per-block fetches on first read and writes them to the local cache.
  • Local block cache. Persists fetched block content to avoid repeated registry round-trips on subsequent reads.
  • Remote container registry. Stores the block-device-formatted image; serves block-range fetches from the image fetcher.

Why it works

Two empirical properties of containerised serving workloads:

  1. Containers do not touch most of their image at startup. CUDA + cuDNN + PyTorch + framework + a model weights file is a multi-GB image; the application reads a small fraction of it to boot.
  2. Steady-state working set is bounded. Once warm, the container reads roughly the same set of files repeatedly; local caching covers most reads after a brief warm-up.

The pattern exploits both: skip the full pull cost (property 1); amortise the per-block fetch cost across the container's lifetime (property 2).

When to use

  • Container image is multi-GB and gzip pull is the dominant pod-start cost.
  • Autoscaling depends on fast pod start to absorb traffic ramps without latency spikes (see patterns/asymmetric-aggressive-up-conservative-down-autoscaling).
  • Application can start before reading the full filesystem — true for most serving frameworks; false for workloads that walk the entire image (compilers, full-image scanners).
  • Network to registry is reliable during operation; the local cache absorbs steady-state but first-read tail latency depends on registry RTT.

When not to use

  • Workloads that touch most of the image early still pay the same total bytes — pulling is now spread over time, not eliminated.
  • Air-gapped or registry-unreliable environments where the per-block fetch can fail mid-operation.
  • Multi-hundred-GB foundation models where the model weight load (not the container image) dominates startup.
  • Image bytes change frequently in ways that defeat the local cache (CI-built per-PR images, etc.).

Canonical instance: Databricks Model Serving

The 2026-05-08 Superhuman post canonicalises the pattern:

"When building a container image, we add an extra step to convert the standard, gzip-based image format to the block- device-based format that is suitable for lazy loading. This allows the container image to be represented as a seekable block device with 4MB sectors in production."

"When pulling container images, our customized container runtime retrieves only the metadata required to set up the container's root directory, including directory structure, file names, and permissions, and creates a virtual block device accordingly. It then mounts the virtual block device into the container so that the application can start running right away."

"When the application reads a file for the first time, the I/O request against the virtual block device will issue a callback to the image fetcher process, which retrieves the actual block content from the remote container registry. The retrieved block content is also cached locally to prevent repeated network round trips to the container registry, reducing the impact of variable network latency on future reads."

"This lazy-loading container filesystem eliminates the need to download the entire container image before starting the application, reducing time to start container from several minutes to just a few seconds."

The pattern was originally built for Databricks serverless compute ("Booting Databricks VMs 7× faster") and adopted by the Model Serving team for the Superhuman 200K QPS platform.

Industry siblings

The same problem space is solved by other lazy-loading container filesystem approaches:

  • stargz / estargz — overlay-of-individual-files; per-file lazy loading rather than per-block.
  • nydus — Alibaba's block-based image accelerator; conceptually closest to the Databricks formulation.
  • SOCI (Seekable OCI) — AWS Fargate's lazy-loading approach; per-file with seekable indexes.

The Databricks block-device approach (4 MB sectors) is coarser than per-file approaches; pages in larger granularity, matching the realities of model-weights-and-shared-library access patterns.

Operational shape

Build time
==========
  Application image (gzip OCI, multi-GB)
        ▼  format converter
  Block-device image (4MB sectors, seekable)
  Remote container registry

Pull time
=========
  Pod scheduled
  Customised container runtime
        │ fetch metadata only
  Virtual block device mounted
  Application starts running       ← seconds, not minutes

First-read time
===============
  Application reads file F
  I/O on virtual block device, sector S
        │ callback
  Image fetcher process
        │ block fetch
        ▼  remote registry
  Block content
        │ stored
  Local block cache
  Filesystem read returns to app

Failure modes and mitigations

  • Registry slow / unavailable during first read → tail latency on cold blocks. Mitigation: pre-warm critical blocks, larger cache, regional registry mirrors.
  • Local cache eviction → cold-block penalty re-paid. Mitigation: tune cache size to fit steady-state working set.
  • Sector layout mismatched to access pattern → many small cross-sector reads → high fetch overhead. Mitigation: tune sector size against measured access patterns.
  • Image-fetcher crash → in-flight reads stall. Mitigation: supervisor / health-check on the fetcher process.

Cross-pattern relationships

Seen in

Caveats

  • The pattern is not standardised across the OCI ecosystem; the Databricks formulation is in-house infrastructure.
  • The post does not quantify first-read tail latency or local cache size.
  • The block-device-vs-per-file granularity choice is not justified empirically against alternatives (stargz, nydus).
Last updated · 542 distilled / 1,571 read