CONCEPT Cited by 1 source
Always-on ambient sensing¶
Definition¶
Always-on ambient sensing is the serving-infra envelope where an ML model runs continuously, on a small battery, inside a body-worn or environment-deployed device, and has to respond to sensor input in hard-real-time against tight latency budgets (wake-word, gesture recognition, health-signal detection).
The envelope composes four simultaneously-binding constraints:
- Continuous duty cycle. The model is always listening / watching / measuring. There's no sleep / wake pattern to amortise power over.
- Milliwatt-class sustained power budget. Hearables run on coin-cell-class batteries. AR glasses run on small lens-frame batteries, thermal-limited by skin contact. Smartwatches run on small wrist-worn batteries. Continuous draw of more than a few milliwatts measurably shortens usable device life between charges and heats the device uncomfortably.
- Hard-real-time response. The use case fails if the model responds slowly — wake-word has to trigger within tens of milliseconds of the utterance ending, gesture recognition within a frame or two of the motion, health detection within the measurement cadence.
- On-device serving, not cloud. By construction — cloud round-trip is both too slow for the latency budget and too power-hungry (wireless transmission costs far more than local inference at these budgets).
Canonical device classes¶
Coral NPU's announcement names four:
- Hearables (earbuds) — wake-word / speaker-ID / noise-cancel / on-ear-biometric detection. Power budget: coin-cell-class, typically 20–100 mWh usable; continuous ML inference at a few milliwatts = hours-to-days of use.
- AR glasses — on-scene object recognition, gesture detection, speech-to-text, real-time translation overlays. Power budget bound by frame size + skin-contact thermals.
- Smartwatches — heart-rate anomaly detection, fall detection, sleep staging, activity recognition, voice triggering. Duty cycle is 24/7 for health signals.
- Edge devices (environment-deployed sensors / IoT nodes) — ambient-sound classification, anomaly detection, local occupancy, industrial anomaly monitoring. Power sometimes harvested (solar, vibration).
(Source: sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai)
Implication for ML serving¶
The envelope forces specific architectural choices:
- Perf/watt is the binding metric. Peak GOPS / TOPS claims are marketing; sustained operations per milliwatt is what the device-class can actually use. The Coral NPU design point — "~512 GOPS at a few milliwatts" — is expressed in exactly this form.
- ML-first architecture is preferred, sometimes forced. Scalar-first silicon at the same power point can't run the ML workload at all — there's no flexibility budget to spend. When the device's reason to exist is the ML model, the architecture has to optimise for it first.
- Model compression is mandatory, not optional. Weights have to fit in the tens-of-MB budget; activations have to fit on-chip SRAM; inference has to run at quantized precisions (INT8 / INT4 / binary) because floating-point multiplies cost too many picojoules per operation.
- Wake / gate tiers help. A common pattern is a very cheap "is there any signal?" detector running continuously, waking a larger model only on signal — but even the always-on tier has to stay inside the mW envelope.
- Cloud offload is available only for rare slow-path events. A smartwatch can send anomalous heart-rate events to a paired phone for review, but can't round-trip to the cloud on every ECG frame.
Contrast with other on-device inference envelopes¶
Always-on ambient sensing is at the tightest power end of the on-device ML inference spectrum:
| Envelope | Power | Duty cycle | Model size | Example |
|---|---|---|---|---|
| Server-side | kW/rack | Sustained | Unbounded | Datacenter LLM |
| Phone / laptop on-demand | 1–5 W bursts | Event-triggered | 100s MB | Camera AI filter at capture time |
| Always-on ambient sensing | mW sustained | 24/7 | MBs | Wake-word, heart-rate anomaly |
| Energy-harvesting IoT | μW sustained | 24/7 | KBs | Vibration-anomaly sensor |
Each envelope selects different architectural choices at the chip, compiler, and model levels. Always-on ambient sensing is the specific row Coral NPU targets.
Why server-side cascades don't cover this envelope¶
One could imagine "run the model in the cloud, stream audio up." At always-on cadence this fails on two axes:
- Power. Continuous wireless transmission costs far more than local inference — radio TX is measured in hundreds of mW, orders of magnitude above the local-inference budget.
- Latency. Wake-word has to trigger in tens of milliseconds; a cloud round-trip over Bluetooth → phone → WAN → datacenter → response path is typically 100–500 ms best-case, often worse under congested radio conditions.
Privacy (continuous microphone audio leaving the device) is a separate argument in the same direction but not the primary forcing function — the physics already rule out the cloud option at ambient-sensing cadence.
Seen in¶
- sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai — canonical source; names the envelope ("making it ideal for always-on ambient sensing") and the four target device classes (edge devices, hearables, AR glasses, smartwatches). Coral NPU is scoped explicitly around this envelope: ~512 GOPS at a few milliwatts.
Related¶
- concepts/on-device-ml-inference — parent concept; ambient sensing is the tightest-power subset.
- concepts/performance-per-watt — the binding metric for this envelope.
- concepts/ml-first-architecture — the chip-design stance the envelope tends to force.
- concepts/hardware-software-codesign — the methodology that gets the chip, the compiler, and the model into the same milliwatt-class envelope.
- systems/coral-npu — canonical wiki silicon instance.