SYSTEM Cited by 1 source

Intel Topdown Microarchitecture Analysis (TMA)¶

Topdown Microarchitecture Analysis (TMA) is Intel's structured methodology for attributing every retired (or non-retired) CPU pipeline slot to one of four top-level buckets — Front-End Bound, Back-End Bound, Bad Speculation, or Retiring — and then drilling down into sub-buckets (cache misses, memory bandwidth, branch mispredictions, contested accesses, false sharing, etc.).

TMA is exposed through Intel PMU (Performance Monitoring Unit) events and is queryable on Linux via perf record / perf report with the appropriate topdown-* event set.

Four top-level buckets¶

Bucket	Meaning
Retiring	Useful work: slots actually retiring instructions
Front-End Bound	Stalled waiting for instructions to decode (iTLB miss, icache miss, decoder stall)
Back-End Bound	Stalled waiting for data or execution resources (dcache miss, memory BW, contested atomics)
Bad Speculation	Work squashed: mispredicted branches, pipeline flushes

On Icelake and later, the method is iterative — once a top bucket is identified, sub-buckets narrow the cause further (e.g. Back-End → Memory Bound → L3 → contested accesses → false sharing).

Reference: TMA Addressing Challenges in Icelake (Ahmad Yasin, 2018 Petascale Tools Workshop).

Seen in¶

Netflix used perf record + Intel TMA to diagnose their containerd + r5.metal startup hang. Output:

95.5% of pipeline slots stalled on tma_contested_accesses (Back-End → Memory Bound → Contested).
57% of slots in false sharing (multiple cores modifying the same cache line and bouncing ownership).

This was the signal that pointed from "containerd is slow" → "the kernel mount lock is the bottleneck" → "the r5.metal cache hierarchy amplifies the contention". Without TMA, the team would have had the pause-instruction hot spot in path_init() but no quantification of why — 95.5% contested-access attribution is what made it a hardware-architecture conversation, not just a software one.

Source: sources/2026-02-28-netflix-mount-mayhem-at-netflix-scaling-containers-on-modern-cpus.

Why this matters for wiki patterns¶

TMA is the load-bearing diagnostic for contended-lock microbenchmarks — the benchmark surfaces the cliff, TMA explains the cliff. It complements concepts/cpu-utilization-vs-saturation at a deeper layer: vmstat's CPU-saturation signal tells you threads are waiting for CPU; TMA tells you what the thread that's running is waiting on.

Limitations¶

Intel-only. AMD has its own Performance Monitoring methodology with overlapping but different event semantics.
Requires kernel + hardware support. Cloud VMs vary in which PMU events are exposed; bare-metal (like r5.metal) is reliable, VM-hosted instances often aren't.
Top-down attribution can hide co-variants. If two effects both hit contested-access, TMA lumps them into one sub-bucket.

concepts/false-sharing — the sub-bucket most often responsible for contested-access stalls
concepts/cpu-mesh-interconnect — mesh-topology CPUs where TOR queueing amplifies TMA contested-access numbers
concepts/hyperthreading-contention — HT-enabled cores can worsen TMA Front-End / Back-End numbers simultaneously
patterns/contended-lock-microbenchmark — the complementary pattern; TMA explains what the microbenchmark reveals

Intel Topdown Microarchitecture Analysis (TMA)¶

Four top-level buckets¶

Seen in¶

Netflix Mount Mayhem — attributing VFS-lock spin to false sharing (2026-02-28)¶

Why this matters for wiki patterns¶

Limitations¶

Related¶