Skip to content

PATTERN Cited by 1 source

File-watcher atomic-swap consolidated map

Problem

An in-process server module needs to maintain a consolidated lookup structure — e.g., model_name → version → feature_allowlist — assembled from multiple independently-updated source artefacts on disk (one file per model bundle). Requirements:

  • Reads must be fast and lock-free enough for the hot path (millions of lookups per second at Pinterest's scale).
  • Writes arrive asynchronously (bundle deploys roll out independently) and must be visible without restart.
  • Concurrent updates must not corrupt the consolidated view — multiple bundles can refresh in overlapping windows.
  • A corrupt or partial update to one source must not poison the consolidated view for all others.

Solution

Three-layer structure with file watchers, per-source maps, a consolidated map, and atomic swap under a read-write lock.

Independent source maps[bundle]       (one per on-disk artefact)
          └── file watcher per artefact → triggers reload of that map only
                                           (other bundles' maps untouched)

Consolidated map                       (the hot-path read surface)
          └── rebuilt from ALL independent maps on any change
          └── atomically replaces the current active consolidated map

RW lock
  ├── shared lock  — reads of consolidated map + independent maps
  └── unique lock  — the atomic swap of the consolidated map

Pinterest's articulation

From the 2026-05-01 Feature Trimmer post (Source: sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer):

"Configuration: The root cluster is configured with the active model bundles, and the file path for each corresponding module_info.json is set using GFlags. Initial Loading: The feature trimmer module loads the content of each module_info.json file into an independent in-memory map. Monitor for Content Updates: A file watcher is attached to each module_info.json. Any content refresh triggers a reload of its contents into the in-memory map for the given model bundle. Consolidation: On initial loading or when any model bundle is refreshed, the module: Scans and merges all independent maps. Creates a new consolidated map. Atomically replaces the current active consolidated map with the new one. Concurrency Management w/ Read-Write Lock: Concurrent reads of the consolidated and independent maps are managed with a shared lock. Write access during the map replacement is managed with a unique lock."

Why the two-layer design (per-bundle maps + consolidated map)

The pattern could be simplified to a single consolidated map updated in place — but that would couple all bundles' update risk together. The two-layer design gives:

Failure isolation per bundle

If bundle A's module_info.json gets corrupted on disk during an update, the trimmer's independent map for bundle A stays on the old version (file-watcher sees the corruption or parse error and keeps the prior content). Bundles B, C, D are unaffected; their refreshes continue to trigger full consolidated-map rebuilds. A "bad bundle" fails in isolation.

Pinterest's explicit framing: "If a model bundle's file gets corrupted on disk during an update, the feature trimmer keeps using the old, in-memory version for that bundle. Because each bundle has its own map, the feature trimmer can still successfully update the information for all the other model bundles."

Lock-free-enough reads

The hot path (every score request does a trim lookup) takes only the shared read lock on the consolidated map. Atomic swap happens under a unique lock for microseconds — just enough to swap a pointer. Reads never wait on bundle parsing, which is the slow part of the refresh.

No partial-state reads

The consolidated map is never partially rebuilt in place. A read under shared lock sees either the old map or the new map — never a half-merged state where some models are on new allowlists and others are on stale ones.

When it fits

  • Read-heavy + infrequent writes. Score requests happen millions of times per second; bundle refreshes happen hourly to daily.
  • Multiple independent update sources that should be decoupled for failure isolation.
  • The consolidated view is small enough to fit in memory and cheap enough to rebuild on any source change (Pinterest's module_info.json is kilobytes per bundle).
  • Strong read consistency within a single request — one score request sees one coherent view.
  • Restart-free hot reload required — the config updates without taking the host offline.

When it doesn't fit

  • Consolidated view is large (gigabytes). Full rebuild on every update becomes expensive; use differential updates.
  • Writes are the common case, reads are rare. A consolidated-map design assumes the inverse.
  • Cross-source ordering matters — the pattern consolidates maps without any cross-source transactional guarantee beyond "the snapshot is internally consistent at rebuild time."
  • Updates need confirmation back to the publisher — this is one-way push-from-disk; no ack semantics.

Failure modes

  • Thundering-herd refresh if many bundles update simultaneously. Rebuilds serialise under the unique lock but the rebuild work itself may contend. Mitigated at Pinterest scale because bundle deploys are staged (canary → prod) rather than simultaneous.
  • Stale-bundle silent persistence — if a bundle's update silently corrupts and no on-call alert fires, the trimmer runs on a stale allowlist for that bundle indefinitely. Pinterest mitigates at init ("failures while parsing the required module_info artifacts are emitted to our observability dashboard and trigger an on-call alert") but not continuously at runtime.
  • Consolidation-time corruption — a bug in the merge logic could produce a broken consolidated map. The atomic swap discipline doesn't protect against this; only testing + validation at merge time does.
  • Host-launch blocked on bundle parsing — Pinterest explicitly chose not to block host launch on parse failure, because doing so "would undermine our ability to respond to capacity-related incidents."

Seen in

Sibling patterns

  • patterns/hot-swap-retrofit — sibling pattern at the component-swap altitude; runtime replacement of live components.
  • patterns/runtime-backend-swap-on-failure — sibling at the backend-failover altitude; swap under failure, here swap under config update.
  • Copy-on-write data structures — same atomic-swap principle at data-structure altitude.
  • Immutable-collection-with-atomic-reference (Clojure atoms, Scala AtomicReference[Map]) — same principle expressed in language primitives.
Last updated · 445 distilled / 1,275 read