SYSTEM Cited by 1 source

Linux overlayfs¶

overlayfs is the Linux kernel union-filesystem that composes multiple read-only directory trees (lowerdirs) and one read-write directory (upperdir) into a single merged filesystem view, with changes landing in a separate writable layer (workdir). It is the standard way every major container runtime (containerd, Docker, CRI-O) assembles a container's root filesystem from an OCI image's stack of layers.

Conceptually:

overlayfs rootfs =  lowerdir1 : lowerdir2 : ... : lowerdirN
                    (image layers, bottom-up, read-only)
                  + upperdir
                    (container-writable layer)
                  + workdir
                    (staging area for atomic overlayfs ops)

Role in container startup¶

For each container, the runtime:

Prepares each image layer as a directory on the host filesystem (extracted from the OCI image tarball, typically under /var/lib/containerd/... or equivalent).
Bind-mounts / exposes each layer appropriately for the container's user and security context — in Titus's new runtime, idmap-mounts each layer with the container's unique host user range.
Mounts an overlayfs with the resulting directories as lowerdirs.
Once the overlayfs is constructed, the per-layer bind mounts can be unmounted — overlayfs holds its own references to the underlying directories.

Seen in¶

Netflix Mount Mayhem — per-layer bind mounts × per container (2026-02-28 Netflix post)¶

On Titus with the new kubelet + containerd runtime, for a container image with N layers the overlayfs construction path issues N bind-mount + N unmount operations per container, twice (once for image-user-info inspection, once for the real rootfs). Every one of those mount operations takes the global VFS mount lock. At 100 concurrent container starts × 50 layers × 2 traversals, that's 20 200 mount operations all contending on one kernel lock.

Netflix's upstream fix (containerd PR #12092) is to bind-mount the common parent directory of all the layers once instead of each layer individually — the overlayfs still sees the same lowerdir paths via relative names under the parent, but the mount count goes from O(N) to O(1). See patterns/common-parent-bind-mount for the generalised pattern.

Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.

Why not just untar everything into one tree?¶

Historically container runtimes did exactly that, at the cost of copy-on-write semantics: two containers sharing the same base image duplicated the files on disk. overlayfs's union model lets N containers share one on-disk copy of every image layer and still each have their own writable upperdir; the price is the mount-table overhead documented in Mount Mayhem.

systems/containerd — primary consumer in modern Kubernetes
systems/netflix-titus — where Mount Mayhem surfaced
concepts/kernel-idmap-mount — the feature that makes per-layer bind mounts necessary under per-container user namespaces
concepts/linux-vfs-mount-lock — the global kernel lock that bottlenecks overlayfs construction
concepts/container-layer-count — why image-layer count became a first-class performance variable
patterns/common-parent-bind-mount — the structural fix

Linux overlayfs¶

Role in container startup¶

Seen in¶

Netflix Mount Mayhem — per-layer bind mounts × per container (2026-02-28 Netflix post)¶

Why not just untar everything into one tree?¶

Related¶