Skip to content

SYSTEM Cited by 1 source

Intel Topdown Microarchitecture Analysis (TMA)

Topdown Microarchitecture Analysis (TMA) is Intel's structured methodology for attributing every retired (or non-retired) CPU pipeline slot to one of four top-level buckets — Front-End Bound, Back-End Bound, Bad Speculation, or Retiring — and then drilling down into sub-buckets (cache misses, memory bandwidth, branch mispredictions, contested accesses, false sharing, etc.).

TMA is exposed through Intel PMU (Performance Monitoring Unit) events and is queryable on Linux via perf record / perf report with the appropriate topdown-* event set.

Four top-level buckets

Bucket Meaning
Retiring Useful work: slots actually retiring instructions
Front-End Bound Stalled waiting for instructions to decode (iTLB miss, icache miss, decoder stall)
Back-End Bound Stalled waiting for data or execution resources (dcache miss, memory BW, contested atomics)
Bad Speculation Work squashed: mispredicted branches, pipeline flushes

On Icelake and later, the method is iterative — once a top bucket is identified, sub-buckets narrow the cause further (e.g. Back-End → Memory Bound → L3 → contested accesses → false sharing).

Reference: TMA Addressing Challenges in Icelake (Ahmad Yasin, 2018 Petascale Tools Workshop).

Seen in

Netflix Mount Mayhem — attributing VFS-lock spin to false sharing (2026-02-28)

Netflix used perf record + Intel TMA to diagnose their containerd + r5.metal startup hang. Output:

  • 95.5% of pipeline slots stalled on tma_contested_accesses (Back-End → Memory Bound → Contested).
  • 57% of slots in false sharing (multiple cores modifying the same cache line and bouncing ownership).

This was the signal that pointed from "containerd is slow""the kernel mount lock is the bottleneck""the r5.metal cache hierarchy amplifies the contention". Without TMA, the team would have had the pause-instruction hot spot in path_init() but no quantification of why — 95.5% contested-access attribution is what made it a hardware-architecture conversation, not just a software one.

Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.

Why this matters for wiki patterns

TMA is the load-bearing diagnostic for contended-lock microbenchmarks — the benchmark surfaces the cliff, TMA explains the cliff. It complements concepts/cpu-utilization-vs-saturation at a deeper layer: vmstat's CPU-saturation signal tells you threads are waiting for CPU; TMA tells you what the thread that's running is waiting on.

Limitations

  • Intel-only. AMD has its own Performance Monitoring methodology with overlapping but different event semantics.
  • Requires kernel + hardware support. Cloud VMs vary in which PMU events are exposed; bare-metal (like r5.metal) is reliable, VM-hosted instances often aren't.
  • Top-down attribution can hide co-variants. If two effects both hit contested-access, TMA lumps them into one sub-bucket.
Last updated · 319 distilled / 1,201 read