SYSTEM Cited by 1 source
Linux overlayfs¶
overlayfs is the Linux kernel union-filesystem that composes multiple read-only directory trees (lowerdirs) and one read-write directory (upperdir) into a single merged filesystem view, with changes landing in a separate writable layer (workdir). It is the standard way every major container runtime (containerd, Docker, CRI-O) assembles a container's root filesystem from an OCI image's stack of layers.
Conceptually:
overlayfs rootfs = lowerdir1 : lowerdir2 : ... : lowerdirN
(image layers, bottom-up, read-only)
+ upperdir
(container-writable layer)
+ workdir
(staging area for atomic overlayfs ops)
Role in container startup¶
For each container, the runtime:
- Prepares each image layer as a directory on the host filesystem (extracted from the OCI image tarball, typically under
/var/lib/containerd/...or equivalent). - Bind-mounts / exposes each layer appropriately for the container's user and security context — in Titus's new runtime, idmap-mounts each layer with the container's unique host user range.
- Mounts an overlayfs with the resulting directories as
lowerdirs. - Once the overlayfs is constructed, the per-layer bind mounts can be unmounted — overlayfs holds its own references to the underlying directories.
Seen in¶
Netflix Mount Mayhem — per-layer bind mounts × per container (2026-02-28 Netflix post)¶
On Titus with the new kubelet + containerd runtime, for a container image with N layers the overlayfs construction path issues N bind-mount + N unmount operations per container, twice (once for image-user-info inspection, once for the real rootfs). Every one of those mount operations takes the global VFS mount lock. At 100 concurrent container starts × 50 layers × 2 traversals, that's 20 200 mount operations all contending on one kernel lock.
Netflix's upstream fix (containerd PR #12092) is to bind-mount the common parent directory of all the layers once instead of each layer individually — the overlayfs still sees the same lowerdir paths via relative names under the parent, but the mount count goes from O(N) to O(1). See patterns/common-parent-bind-mount for the generalised pattern.
Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.
Why not just untar everything into one tree?¶
Historically container runtimes did exactly that, at the cost of copy-on-write semantics: two containers sharing the same base image duplicated the files on disk. overlayfs's union model lets N containers share one on-disk copy of every image layer and still each have their own writable upperdir; the price is the mount-table overhead documented in Mount Mayhem.
Related¶
- systems/containerd — primary consumer in modern Kubernetes
- systems/netflix-titus — where Mount Mayhem surfaced
- concepts/kernel-idmap-mount — the feature that makes per-layer bind mounts necessary under per-container user namespaces
- concepts/linux-vfs-mount-lock — the global kernel lock that bottlenecks overlayfs construction
- concepts/container-layer-count — why image-layer count became a first-class performance variable
- patterns/common-parent-bind-mount — the structural fix