CONCEPT Cited by 1 source

Always-on ambient sensing¶

Definition¶

Always-on ambient sensing is the serving-infra envelope where an ML model runs continuously, on a small battery, inside a body-worn or environment-deployed device, and has to respond to sensor input in hard-real-time against tight latency budgets (wake-word, gesture recognition, health-signal detection).

The envelope composes four simultaneously-binding constraints:

Continuous duty cycle. The model is always listening / watching / measuring. There's no sleep / wake pattern to amortise power over.
Milliwatt-class sustained power budget. Hearables run on coin-cell-class batteries. AR glasses run on small lens-frame batteries, thermal-limited by skin contact. Smartwatches run on small wrist-worn batteries. Continuous draw of more than a few milliwatts measurably shortens usable device life between charges and heats the device uncomfortably.
Hard-real-time response. The use case fails if the model responds slowly — wake-word has to trigger within tens of milliseconds of the utterance ending, gesture recognition within a frame or two of the motion, health detection within the measurement cadence.
On-device serving, not cloud. By construction — cloud round-trip is both too slow for the latency budget and too power-hungry (wireless transmission costs far more than local inference at these budgets).

Canonical device classes¶

Coral NPU's announcement names four:

Hearables (earbuds) — wake-word / speaker-ID / noise-cancel / on-ear-biometric detection. Power budget: coin-cell-class, typically 20–100 mWh usable; continuous ML inference at a few milliwatts = hours-to-days of use.
AR glasses — on-scene object recognition, gesture detection, speech-to-text, real-time translation overlays. Power budget bound by frame size + skin-contact thermals.
Smartwatches — heart-rate anomaly detection, fall detection, sleep staging, activity recognition, voice triggering. Duty cycle is 24/7 for health signals.
Edge devices (environment-deployed sensors / IoT nodes) — ambient-sound classification, anomaly detection, local occupancy, industrial anomaly monitoring. Power sometimes harvested (solar, vibration).

(Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai)

Implication for ML serving¶

The envelope forces specific architectural choices:

Perf/watt is the binding metric. Peak GOPS / TOPS claims are marketing; sustained operations per milliwatt is what the device-class can actually use. The Coral NPU design point — "~512 GOPS at a few milliwatts" — is expressed in exactly this form.
ML-first architecture is preferred, sometimes forced. Scalar-first silicon at the same power point can't run the ML workload at all — there's no flexibility budget to spend. When the device's reason to exist is the ML model, the architecture has to optimise for it first.
Model compression is mandatory, not optional. Weights have to fit in the tens-of-MB budget; activations have to fit on-chip SRAM; inference has to run at quantized precisions (INT8 / INT4 / binary) because floating-point multiplies cost too many picojoules per operation.
Wake / gate tiers help. A common pattern is a very cheap "is there any signal?" detector running continuously, waking a larger model only on signal — but even the always-on tier has to stay inside the mW envelope.
Cloud offload is available only for rare slow-path events. A smartwatch can send anomalous heart-rate events to a paired phone for review, but can't round-trip to the cloud on every ECG frame.

Contrast with other on-device inference envelopes¶

Always-on ambient sensing is at the tightest power end of the on-device ML inference spectrum:

Envelope	Power	Duty cycle	Model size	Example
Server-side	kW/rack	Sustained	Unbounded	Datacenter LLM
Phone / laptop on-demand	1–5 W bursts	Event-triggered	100s MB	Camera AI filter at capture time
Always-on ambient sensing	mW sustained	24/7	MBs	Wake-word, heart-rate anomaly
Energy-harvesting IoT	μW sustained	24/7	KBs	Vibration-anomaly sensor

Each envelope selects different architectural choices at the chip, compiler, and model levels. Always-on ambient sensing is the specific row Coral NPU targets.

Why server-side cascades don't cover this envelope¶

One could imagine "run the model in the cloud, stream audio up." At always-on cadence this fails on two axes:

Power. Continuous wireless transmission costs far more than local inference — radio TX is measured in hundreds of mW, orders of magnitude above the local-inference budget.
Latency. Wake-word has to trigger in tens of milliseconds; a cloud round-trip over Bluetooth → phone → WAN → datacenter → response path is typically 100–500 ms best-case, often worse under congested radio conditions.

Privacy (continuous microphone audio leaving the device) is a separate argument in the same direction but not the primary forcing function — the physics already rule out the cloud option at ambient-sensing cadence.

Seen in¶

sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai — canonical source; names the envelope ("making it ideal for always-on ambient sensing") and the four target device classes (edge devices, hearables, AR glasses, smartwatches). Coral NPU is scoped explicitly around this envelope: ~512 GOPS at a few milliwatts.

concepts/on-device-ml-inference — parent concept; ambient sensing is the tightest-power subset.
concepts/performance-per-watt — the binding metric for this envelope.
concepts/ml-first-architecture — the chip-design stance the envelope tends to force.
concepts/hardware-software-codesign — the methodology that gets the chip, the compiler, and the model into the same milliwatt-class envelope.
systems/coral-npu — canonical wiki silicon instance.