SYSTEM Cited by 1 source
Intel Topdown Microarchitecture Analysis (TMA)¶
Topdown Microarchitecture Analysis (TMA) is Intel's structured methodology for attributing every retired (or non-retired) CPU pipeline slot to one of four top-level buckets — Front-End Bound, Back-End Bound, Bad Speculation, or Retiring — and then drilling down into sub-buckets (cache misses, memory bandwidth, branch mispredictions, contested accesses, false sharing, etc.).
TMA is exposed through Intel PMU (Performance Monitoring Unit) events and is queryable on Linux via perf record / perf report with the appropriate topdown-* event set.
Four top-level buckets¶
| Bucket | Meaning |
|---|---|
| Retiring | Useful work: slots actually retiring instructions |
| Front-End Bound | Stalled waiting for instructions to decode (iTLB miss, icache miss, decoder stall) |
| Back-End Bound | Stalled waiting for data or execution resources (dcache miss, memory BW, contested atomics) |
| Bad Speculation | Work squashed: mispredicted branches, pipeline flushes |
On Icelake and later, the method is iterative — once a top bucket is identified, sub-buckets narrow the cause further (e.g. Back-End → Memory Bound → L3 → contested accesses → false sharing).
Reference: TMA Addressing Challenges in Icelake (Ahmad Yasin, 2018 Petascale Tools Workshop).
Seen in¶
Netflix Mount Mayhem — attributing VFS-lock spin to false sharing (2026-02-28)¶
Netflix used perf record + Intel TMA to diagnose their containerd + r5.metal startup hang. Output:
- 95.5% of pipeline slots stalled on
tma_contested_accesses(Back-End → Memory Bound → Contested). - 57% of slots in false sharing (multiple cores modifying the same cache line and bouncing ownership).
This was the signal that pointed from "containerd is slow" → "the kernel mount lock is the bottleneck" → "the r5.metal cache hierarchy amplifies the contention". Without TMA, the team would have had the pause-instruction hot spot in path_init() but no quantification of why — 95.5% contested-access attribution is what made it a hardware-architecture conversation, not just a software one.
Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.
Why this matters for wiki patterns¶
TMA is the load-bearing diagnostic for contended-lock microbenchmarks — the benchmark surfaces the cliff, TMA explains the cliff. It complements concepts/cpu-utilization-vs-saturation at a deeper layer: vmstat's CPU-saturation signal tells you threads are waiting for CPU; TMA tells you what the thread that's running is waiting on.
Limitations¶
- Intel-only. AMD has its own Performance Monitoring methodology with overlapping but different event semantics.
- Requires kernel + hardware support. Cloud VMs vary in which PMU events are exposed; bare-metal (like r5.metal) is reliable, VM-hosted instances often aren't.
- Top-down attribution can hide co-variants. If two effects both hit contested-access, TMA lumps them into one sub-bucket.
Related¶
- concepts/false-sharing — the sub-bucket most often responsible for contested-access stalls
- concepts/cpu-mesh-interconnect — mesh-topology CPUs where TOR queueing amplifies TMA contested-access numbers
- concepts/hyperthreading-contention — HT-enabled cores can worsen TMA Front-End / Back-End numbers simultaneously
- patterns/contended-lock-microbenchmark — the complementary pattern; TMA explains what the microbenchmark reveals