Skip to content

PATTERN Cited by 1 source

Pre-silicon validation partnership

Intent

Ship a workload-representative benchmark suite to CPU / SoC / accelerator vendors and collaborate with them on pre-silicon simulations and early-silicon bring-up — so that microarchitectural tuning + SoC-level optimizations on the vendor's roadmap products land matched to your production workload shape, not to the vendor's own synthetic benchmark suite.

Context

A hyperscaler's hardware procurement cycle spans years. By the time a vendor's chip is available and benchmarkable in production, tens of millions of dollars of design decisions have already been locked in. If those decisions were tuned against a benchmark that doesn't represent hyperscale workloads (see concepts/benchmark-methodology-bias), the chip will land with suboptimal performance on the workloads that drive capacity planning.

The fix is to get your workload-representative benchmark into the vendor's pre-silicon flow — architectural simulators, early silicon test chips, cycle-accurate models — so tuning happens against the right shape from the start.

Mechanism

Precondition — a workload-representative benchmark exists

You cannot run this pattern without a workload-representative benchmark that your vendor can share + execute. At Meta that artifact is DCPerf.

Two-phase collaboration

  1. Pre-silicon. Run the benchmark against the vendor's architectural simulators + cycle-accurate models. Iterate on microarchitecture parameters (pipeline depths, cache sizes + hierarchy, branch-prediction structures, SoC power-management policies). "There have been multiple instances where we have been able to identify performance optimizations in areas such as CPU core microarchitecture settings and SOC power management optimizations." (Source: sources/2024-08-05-meta-dcperf-open-source-benchmark-suite)

  2. Early-silicon. Run the benchmark on the first test chips. Catch performance bugs before mass production; catch system- software issues (firmware, driver, kernel, scheduler) before the chip lands in production data centers.

Duration

Meta reports two years of this collaboration cadence with "leading CPU vendors" across pre-silicon and/or early-silicon setups. It's not a one-shot engagement; it's a continuous partnership through the vendor's chip-design + bring-up timeline.

Bidirectional outcomes

Meta frames this collaboration as feeding optimizations in two directions: vendor ships a better chip for Meta's workload; Meta ships cleaner benchmarks + more characterization data back to the vendor. Both parties benefit; the benchmark is the common-language artifact.

Canonical instance — Meta + CPU vendors via DCPerf

Meta explicitly states:

"Over the last two years we have collaborated with leading CPU vendors to further validate DCPerf on pre silicon and/or early silicon setups to debug performance issues and identify hardware and system software optimizations on their roadmap products."

The wiki diagrams this as part of the same post (sources/2024-08-05-meta-dcperf-open-source-benchmark-suite): "areas of HW/SW design where we have seen DCPerf being representative of production usage and being beneficial for delivering relevant performance signals and help with optimizations."

Open-sourcing DCPerf expands the partnership from Meta↔vendors to any hyperscale-relevant organisation ↔ any vendor with access to the suite.

Why it works

  • Shifts tuning left. Optimization happens while design is still malleable, not after the chip is taped out.
  • Aligns vendor incentives with hyperscaler workload. The vendor's public SPEC numbers still matter for non-hyperscaler customers, but Meta-specific-optimization is earned through DCPerf, not assumed.
  • Catches bugs cheap. Performance / correctness issues found pre-silicon cost orders of magnitude less to fix than post-silicon.
  • Enables novel architectures. Chiplet, heterogeneous-core clusters, and mixed-ISA platforms are evaluated against representative workloads before mass deployment; Meta specifically names DCPerf validation on chiplet-based architectures.

Anti-patterns

  • Wait for generally-available silicon. Too late — the relevant design decisions have already been made against someone else's benchmark.
  • NDA-only benchmark. If the benchmark can't be shared in the vendor's simulator / early-silicon environment, the partnership can't run. Meta's answer is open-sourcing DCPerf outright.
  • Aggregate-score-only evaluation. A chip can score well on a benchmark while mispredicting the microarchitectural behaviour the benchmark is supposed to proxy; validate at the right level (see concepts/benchmark-representativeness).

Seen in

Last updated · 319 distilled / 1,201 read