Skip to content

PATTERN Cited by 1 source

Trigger-based edge capture

Intent

Collect production data from an edge fleet only when a meaningful event is detected on-device, rather than continuously streaming or uniformly sampling all sensor traffic. Keeps cloud bandwidth, storage, and downstream annotation cost bounded; keeps the collected corpus signal-dense.

When to use

  • Edge fleet generates gigabytes per device of raw sensor data (video, multi-modal) — continuous streaming to the cloud is infeasible or economically unacceptable.
  • The fraction of raw data that is actually useful for model training is small and event-localised — interactions, transitions, anomalies.
  • Per-device bandwidth is constrained by the deployment environment — retailer store networks, cellular links, battery.
  • Downstream labelling cost is proportional to captured volume; cutting volume 100× is worth the engineering effort of a good trigger.

Mechanism

  1. Define the trigger as a composite of on-device signals. Use cheap detectors that already run on the device. Instacart Capsight Collector uses activity signal (e.g. hand motion) AND recognised barcode. Two signals combined gives "high confidence that a meaningful interaction is occurring".
  2. Capture a time-bounded window around the trigger. Not a single frame — a pre-event + post-event buffer to give training data the full context.
  3. Persist captured clips locally first — don't upload in-line; the uploader is a separate stage with its own policies (see patterns/resilient-edge-uploader).
  4. Plan for trigger evolution. Triggers are a moving target; early versions catch easy cases, later versions add signals for harder cases. Capsight's authors explicitly flag "more signals being developed".

Trade-off: trigger sensitivity

The post calls this out directly:

The sensitivity of this trigger is an important trade-off. Collecting useless data is expensive and increases noise, but missing signals decreases training input.

Trigger tuning is a curve:

  • Too permissive → back to streaming-everything; bandwidth, storage, and labelling costs balloon.
  • Too restrictive → systematic holes in the training data (the exact failure modes the flywheel is supposed to fix never get captured because the trigger doesn't fire on them).

Typical practice: instrument the trigger itself (count fires / misses / false-positives) and compare to labelled ground-truth samples periodically. Treat trigger definition as a first-class model artefact, versioned and evaluated.

Relationship to other patterns

Seen in

Last updated · 319 distilled / 1,201 read