PATTERN Cited by 1 source

Trigger-based edge capture¶

Intent¶

Collect production data from an edge fleet only when a meaningful event is detected on-device, rather than continuously streaming or uniformly sampling all sensor traffic. Keeps cloud bandwidth, storage, and downstream annotation cost bounded; keeps the collected corpus signal-dense.

When to use¶

Edge fleet generates gigabytes per device of raw sensor data (video, multi-modal) — continuous streaming to the cloud is infeasible or economically unacceptable.
The fraction of raw data that is actually useful for model training is small and event-localised — interactions, transitions, anomalies.
Per-device bandwidth is constrained by the deployment environment — retailer store networks, cellular links, battery.
Downstream labelling cost is proportional to captured volume; cutting volume 100× is worth the engineering effort of a good trigger.

Mechanism¶

Define the trigger as a composite of on-device signals. Use cheap detectors that already run on the device. Instacart Capsight Collector uses activity signal (e.g. hand motion) AND recognised barcode. Two signals combined gives "high confidence that a meaningful interaction is occurring".
Capture a time-bounded window around the trigger. Not a single frame — a pre-event + post-event buffer to give training data the full context.
Persist captured clips locally first — don't upload in-line; the uploader is a separate stage with its own policies (see patterns/resilient-edge-uploader).
Plan for trigger evolution. Triggers are a moving target; early versions catch easy cases, later versions add signals for harder cases. Capsight's authors explicitly flag "more signals being developed".

Trade-off: trigger sensitivity¶

The post calls this out directly:

The sensitivity of this trigger is an important trade-off. Collecting useless data is expensive and increases noise, but missing signals decreases training input.

Trigger tuning is a curve:

Too permissive → back to streaming-everything; bandwidth, storage, and labelling costs balloon.
Too restrictive → systematic holes in the training data (the exact failure modes the flywheel is supposed to fix never get captured because the trigger doesn't fire on them).

Typical practice: instrument the trigger itself (count fires / misses / false-positives) and compare to labelled ground-truth samples periodically. Treat trigger definition as a first-class model artefact, versioned and evaluated.

concepts/edge-cloud-data-flywheel — trigger-based capture is the on-device entry point.
concepts/production-data-diversity — the reason to capture at all.
concepts/edge-filtering — the broader principle of dropping irrelevant traffic at the edge.

Relationship to other patterns¶

patterns/data-driven-annotation-curation is the cloud-side counterpart — filter + curate what the trigger lets through. Both belong in a mature flywheel.
patterns/resilient-edge-uploader runs after trigger capture decides a clip is worth keeping.
patterns/distributed-fleet-as-data-pipeline is the framing pattern; trigger-based capture is the mechanism that makes it tractable.

Seen in¶

systems/capsight Collector — activity signal + recognised barcode as the initial trigger for Caper smart carts. (Source: sources/2026-02-17-instacart-turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight)