CONCEPT Cited by 1 source

Heterogeneous AI Accelerator Fleet¶

Definition¶

A heterogeneous AI accelerator fleet is a production AI infrastructure spanning multiple vendors (NVIDIA, AMD, custom proprietary silicon, CPUs) and multiple generations within each vendor, driven by (a) hardware-vendor diversification to reduce dependency on a single supplier, (b) workload-specific fit (training-optimized vs inference-optimized; compute-bound vs memory-bound), and (c) roadmap cadence — vendors refresh silicon every 12-24 months and in-house silicon may refresh faster.

The concept is a forcing function: once the fleet goes heterogeneous, the number of unique kernel configurations that must be written + tested + maintained scales as the product {hardware types × generations × model architectures × operators} — and that product quickly exceeds the capacity of human kernel-expert teams to cover (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).

Canonical statement (Meta KernelEvolve 2026-04-02)¶

"The total number of kernels scales with the product of three factors: {hardware types and generations X model architectures X number of operators}. This product results in thousands of unique kernel configurations that must be written, tested, and maintained. Hand-tuning each kernel doesn't scale, and kernel experts alone can't keep up with the pace."

Meta's fleet as of 2026-04-02:

NVIDIA GPUs (primary: H100; transitioning to Blackwell generation).
AMD GPUs (MI300X).
Meta MTIA — four chip generations in two years (MTIA 300 through 500).
CPUs.

Three dimensions of explosion¶

The 2026-04-02 post enumerates three axes that compound:

Hardware heterogeneity — different vendors have "fundamentally different memory architectures and hierarchies, instruction sets, and execution models. A kernel that runs optimally on one platform may perform poorly or fail entirely on another." Even within a single vendor (e.g. NVIDIA H100 → Blackwell), successive generations introduce architectural changes requiring different optimization strategies.
Model architecture variation — Meta Ads recommendation models alone have evolved through "early embedding-based deep learning recommendation models → sequence learning models → Generative Ads Recommendation Model (GEM) → Meta Adaptive Ranking Model." Each generation introduces operators the previous generation never needed.
Kernel diversity beyond standard libraries — vendor libraries (cuBLAS, cuDNN) cover GEMM + convolution + standard activations, but production workloads are "dominated by a long tail of operators that fall outside library coverage" — feature hashing, bucketing, sequence truncation, fused feature interaction layers, custom attention variants. These "either fall back to CPU — forcing disaggregated serving architectures with significant latency overhead — or run via unoptimized code paths that underutilize hardware."

Architectural response¶

The heterogeneous-fleet forcing function drove Meta to build KernelEvolve — an agentic kernel-authoring system that frames kernel optimization as a search problem and uses RAG over hardware documentation to make proprietary-silicon kernel generation tractable (see concepts/hardware-proprietary-knowledge-injection).

The alternative paths — (a) hire more kernel experts, (b) rely on vendor-library coverage, (c) compiler autotuning alone — were explicitly rejected in the post: "neither human experts nor today's compiler-based autotuning and fusion can fully cover at scale."

Seen in¶

Meta KernelEvolve (2026-04-02, canonical). First wiki instance of the "{hardware × models × operators}" scaling framing as the justification for agentic kernel synthesis at hyperscale. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)

concepts/kernel-optimization-as-search — the structural response primitive.
concepts/hardware-proprietary-knowledge-injection — the mechanism for making proprietary-silicon (MTIA-style) codegen tractable.
systems/kernelevolve — the system instance at Meta.
systems/meta-mtia — the proprietary-silicon axis driving the generational cadence.
patterns/rag-over-hardware-documentation — the pattern instance.
companies/meta — the company this concept was canonicalised from.