SYSTEM Cited by 1 source

Coral NPU¶

What it is¶

Coral NPU is Google Research's reference neural-processing-unit architecture for low-power on-device ML. It's delivered as a set of RISC-V ISA-compliant architectural IP blocks — not a chip — intended for integration into downstream ML-optimised systems-on-chip (SoCs). Announced in the 2025-10-15 Google Research blog post (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai).

Design point:

Compute target: ~512 GOPS (giga operations per second) for the base design.
Power envelope: "a few milliwatts" — i.e., the always-on ambient-sensing bracket.
Target device classes: edge devices, hearables, AR glasses, smartwatches.
ISA: open, RISC-V-compliant — deliberately not proprietary.

Why Google built it¶

The load-bearing framing in the announcement post is that edge ML is stuck between two unattractive options:

General-purpose CPUs on edge devices — flexible, broad toolchain support, but power-inefficient on ML workloads and underperforming on the matrix-heavy operations that dominate modern attention-based and convolutional models.
Specialized ML accelerators — high efficiency on the workloads they target, but inflexible, proprietary, and awkward to combine with the scalar / control-plane code the rest of the application needs.

Coral NPU's answer is to reverse the traditional chip-design precedence: put the ML matrix engine first in the architecture, and treat scalar compute as the secondary resource the matrix engine composes with (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai). That's the ML-first architecture stance, captured in this wiki as a concept.

The software half of the problem is just as load-bearing as the hardware half: "starkly different programming models for CPUs and ML blocks… proprietary compilers and complex command buffers… the industry lacks a mature, low-power architecture that can easily and effectively support multiple ML development frameworks" — the fragmented edge-ML ecosystem that Coral NPU is trying to give a stable reference target to.

Delivery shape¶

Coral NPU is a reference architecture, not a chip. The post describes it as:

As a complete, reference neural processing unit (NPU) architecture, Coral NPU provides the building blocks for the next generation of energy-efficient, ML-optimized systems on chip (SoCs). (Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai)

This shape — reference IP blocks that downstream SoC designers integrate — is an instance of the reference hardware for software ecosystem pattern one level up the stack from Home Assistant Green: the reference hardware here exists so the ML software ecosystem (LiteRT, TFLite, IREE, TVM, Triton, LLVM compiler backends) has a stable, open target to build against — rather than each SoC vendor shipping proprietary tooling.

Why RISC-V¶

The choice of RISC-V as the base ISA is directly downstream of the "proprietary compilers and complex command buffers" complaint in the framing paragraph. RISC-V is:

Open. No licensing gate for implementers or toolchain contributors.
Vendor-neutral. The compiler and runtime ecosystem (LLVM, GCC, LiteRT, IREE, TVM) already targets RISC-V for non-NPU purposes; Coral NPU leverages that installed base instead of rebuilding a toolchain from scratch.
Extensible. The RISC-V custom extension mechanism is the natural hook for ML-matrix-engine instructions that aren't in the base ISA — each NPU-specific op can be a well-defined extension, not a hidden command-buffer encoding.

The raw capture doesn't state whether Coral NPU's matrix operations are implemented as RISC-V custom extensions, as a co-processor addressed via memory-mapped I/O, or via some other integration shape — that's in the unscraped body.

Performance envelope: why "512 GOPS at a few milliwatts"¶

The "512 GOPS at a few milliwatts" phrasing is a performance-per-watt statement, not a peak-throughput statement. The design point is sustained inference at milliwatt-class power because the target device classes —

Hearables (earbuds): runs continuously on a coin-cell- class battery.
AR glasses: runs continuously on a small lens-frame battery, thermal-limited by skin contact.
Smartwatches: runs continuously on a small wrist-worn battery.
Edge devices (always-on sensors): runs continuously on AA / coin-cell / energy-harvesting power.

— all share the "always plugged in to nothing, always sensing" constraint. That's the always-on ambient sensing envelope: model latency has to respect real-time wake-word, user-gesture, or health-signal cadence while the chip doesn't get to spike to watts-class draw, because it has no thermal or battery budget to spike from.

512 GOPS at a few milliwatts places Coral NPU squarely in the compute-class that can run small attention-based models (small LLMs for on-device assistants, keyword-spotting, speaker-ID, gesture recognition) and convolutional models (MobileNet-class vision encoders) at ambient-sensing cadence.

What the raw post does NOT decompose¶

The 2025-10-15 post's raw capture ends shortly after the opening claims. The following are not specified in the wiki's current evidence base:

The IP-block decomposition (scalar core + vector + matrix + DMA + on-chip memory + peripheral interfaces).
Process-node / die-area targets.
The ML-framework first-class support matrix (LiteRT? TFLite-Micro? IREE? TVM? Triton? All of them?).
Quantisation support (INT8 / INT4 / binary / FP16 / MXFP-class formats).
Named production partners / first-shipping SoCs.
Licensing terms.
Benchmark comparisons against existing edge accelerators (Apple Neural Engine, Qualcomm Hexagon, Arm Ethos-U, the earlier Coral Edge TPU).

These live in the unscraped body of https://research.google/blog/coral-npu-a-full-stack-platform-for-edge-ai/.

Relationship to the existing Coral product line¶

Google's Coral product line — the Edge TPU USB Accelerator, Coral Dev Board, Coral Mini PCIe / M.2 modules — has been shipping ML accelerators for edge devices since 2019, based on the Edge TPU ASIC. The 2025-10-15 announcement positions Coral NPU as a reference architecture (RISC-V IP blocks) rather than a chip, which is architecturally distinct from the Edge-TPU-based Coral boards.

The raw capture does not explicitly decompose how Coral NPU relates to the existing Edge-TPU-based Coral boards — whether it succeeds them, composes with them, or is orthogonal. Flag as an open question pending deeper ingest of the post body or later Google Research / Coral team posts.

Seen in¶

sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai — announcement post; sole current source. Captures the problem framing (general-purpose-vs-specialized dichotomy + fragmented software ecosystem) and the top-level architectural claims (RISC-V ISA, ML-matrix-engine-first design, ~512 GOPS at a few milliwatts, target device classes).

concepts/ml-first-architecture — the organising architectural stance; Coral NPU is the canonical wiki instance so far.
concepts/always-on-ambient-sensing — the serving-infra envelope; Coral NPU's power/compute target is calibrated to this envelope.
concepts/on-device-ml-inference — the broader class of substrate targets Coral NPU serves.
concepts/hardware-software-codesign — the methodology; Coral NPU instantiates it at the edge-ML level.
concepts/performance-per-watt — the binding hardware metric.
concepts/fragmented-hardware-software-ecosystem — the problem Coral NPU's RISC-V + reference-architecture shape is trying to heal.
concepts/matrix-multiplication-accumulate — the primitive the ML matrix engine exists to accelerate.
patterns/reference-hardware-for-software-ecosystem — the delivery-shape pattern; reference silicon IP so the ML software ecosystem has a stable target.
systems/mobilenet — representative ML-workload class Coral NPU is sized to run at ambient-sensing cadence.
systems/nvidia-tensor-core — datacenter-scale counterpart to Coral NPU's matrix engine; same primitive (MMA), vastly different power envelope.