Skip to content

CONCEPT Cited by 1 source

Always-on ambient sensing

Definition

Always-on ambient sensing is the serving-infra envelope where an ML model runs continuously, on a small battery, inside a body-worn or environment-deployed device, and has to respond to sensor input in hard-real-time against tight latency budgets (wake-word, gesture recognition, health-signal detection).

The envelope composes four simultaneously-binding constraints:

  • Continuous duty cycle. The model is always listening / watching / measuring. There's no sleep / wake pattern to amortise power over.
  • Milliwatt-class sustained power budget. Hearables run on coin-cell-class batteries. AR glasses run on small lens-frame batteries, thermal-limited by skin contact. Smartwatches run on small wrist-worn batteries. Continuous draw of more than a few milliwatts measurably shortens usable device life between charges and heats the device uncomfortably.
  • Hard-real-time response. The use case fails if the model responds slowly — wake-word has to trigger within tens of milliseconds of the utterance ending, gesture recognition within a frame or two of the motion, health detection within the measurement cadence.
  • On-device serving, not cloud. By construction — cloud round-trip is both too slow for the latency budget and too power-hungry (wireless transmission costs far more than local inference at these budgets).

Canonical device classes

Coral NPU's announcement names four:

  • Hearables (earbuds) — wake-word / speaker-ID / noise-cancel / on-ear-biometric detection. Power budget: coin-cell-class, typically 20–100 mWh usable; continuous ML inference at a few milliwatts = hours-to-days of use.
  • AR glasses — on-scene object recognition, gesture detection, speech-to-text, real-time translation overlays. Power budget bound by frame size + skin-contact thermals.
  • Smartwatches — heart-rate anomaly detection, fall detection, sleep staging, activity recognition, voice triggering. Duty cycle is 24/7 for health signals.
  • Edge devices (environment-deployed sensors / IoT nodes) — ambient-sound classification, anomaly detection, local occupancy, industrial anomaly monitoring. Power sometimes harvested (solar, vibration).

(Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai)

Implication for ML serving

The envelope forces specific architectural choices:

  • Perf/watt is the binding metric. Peak GOPS / TOPS claims are marketing; sustained operations per milliwatt is what the device-class can actually use. The Coral NPU design point — "~512 GOPS at a few milliwatts" — is expressed in exactly this form.
  • ML-first architecture is preferred, sometimes forced. Scalar-first silicon at the same power point can't run the ML workload at all — there's no flexibility budget to spend. When the device's reason to exist is the ML model, the architecture has to optimise for it first.
  • Model compression is mandatory, not optional. Weights have to fit in the tens-of-MB budget; activations have to fit on-chip SRAM; inference has to run at quantized precisions (INT8 / INT4 / binary) because floating-point multiplies cost too many picojoules per operation.
  • Wake / gate tiers help. A common pattern is a very cheap "is there any signal?" detector running continuously, waking a larger model only on signal — but even the always-on tier has to stay inside the mW envelope.
  • Cloud offload is available only for rare slow-path events. A smartwatch can send anomalous heart-rate events to a paired phone for review, but can't round-trip to the cloud on every ECG frame.

Contrast with other on-device inference envelopes

Always-on ambient sensing is at the tightest power end of the on-device ML inference spectrum:

Envelope Power Duty cycle Model size Example
Server-side kW/rack Sustained Unbounded Datacenter LLM
Phone / laptop on-demand 1–5 W bursts Event-triggered 100s MB Camera AI filter at capture time
Always-on ambient sensing mW sustained 24/7 MBs Wake-word, heart-rate anomaly
Energy-harvesting IoT μW sustained 24/7 KBs Vibration-anomaly sensor

Each envelope selects different architectural choices at the chip, compiler, and model levels. Always-on ambient sensing is the specific row Coral NPU targets.

Why server-side cascades don't cover this envelope

One could imagine "run the model in the cloud, stream audio up." At always-on cadence this fails on two axes:

  • Power. Continuous wireless transmission costs far more than local inference — radio TX is measured in hundreds of mW, orders of magnitude above the local-inference budget.
  • Latency. Wake-word has to trigger in tens of milliseconds; a cloud round-trip over Bluetooth → phone → WAN → datacenter → response path is typically 100–500 ms best-case, often worse under congested radio conditions.

Privacy (continuous microphone audio leaving the device) is a separate argument in the same direction but not the primary forcing function — the physics already rule out the cloud option at ambient-sensing cadence.

Seen in

Last updated · 200 distilled / 1,178 read