Skip to content

SYSTEM Cited by 1 source

Linux overlayfs

overlayfs is the Linux kernel union-filesystem that composes multiple read-only directory trees (lowerdirs) and one read-write directory (upperdir) into a single merged filesystem view, with changes landing in a separate writable layer (workdir). It is the standard way every major container runtime (containerd, Docker, CRI-O) assembles a container's root filesystem from an OCI image's stack of layers.

Conceptually:

overlayfs rootfs =  lowerdir1 : lowerdir2 : ... : lowerdirN
                    (image layers, bottom-up, read-only)
                  + upperdir
                    (container-writable layer)
                  + workdir
                    (staging area for atomic overlayfs ops)

Role in container startup

For each container, the runtime:

  1. Prepares each image layer as a directory on the host filesystem (extracted from the OCI image tarball, typically under /var/lib/containerd/... or equivalent).
  2. Bind-mounts / exposes each layer appropriately for the container's user and security context — in Titus's new runtime, idmap-mounts each layer with the container's unique host user range.
  3. Mounts an overlayfs with the resulting directories as lowerdirs.
  4. Once the overlayfs is constructed, the per-layer bind mounts can be unmounted — overlayfs holds its own references to the underlying directories.

Seen in

Netflix Mount Mayhem — per-layer bind mounts × per container (2026-02-28 Netflix post)

On Titus with the new kubelet + containerd runtime, for a container image with N layers the overlayfs construction path issues N bind-mount + N unmount operations per container, twice (once for image-user-info inspection, once for the real rootfs). Every one of those mount operations takes the global VFS mount lock. At 100 concurrent container starts × 50 layers × 2 traversals, that's 20 200 mount operations all contending on one kernel lock.

Netflix's upstream fix (containerd PR #12092) is to bind-mount the common parent directory of all the layers once instead of each layer individually — the overlayfs still sees the same lowerdir paths via relative names under the parent, but the mount count goes from O(N) to O(1). See patterns/common-parent-bind-mount for the generalised pattern.

Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.

Why not just untar everything into one tree?

Historically container runtimes did exactly that, at the cost of copy-on-write semantics: two containers sharing the same base image duplicated the files on disk. overlayfs's union model lets N containers share one on-disk copy of every image layer and still each have their own writable upperdir; the price is the mount-table overhead documented in Mount Mayhem.

Last updated · 319 distilled / 1,201 read