PATTERN Cited by 1 source
Trigger-based edge capture¶
Intent¶
Collect production data from an edge fleet only when a meaningful event is detected on-device, rather than continuously streaming or uniformly sampling all sensor traffic. Keeps cloud bandwidth, storage, and downstream annotation cost bounded; keeps the collected corpus signal-dense.
When to use¶
- Edge fleet generates gigabytes per device of raw sensor data (video, multi-modal) — continuous streaming to the cloud is infeasible or economically unacceptable.
- The fraction of raw data that is actually useful for model training is small and event-localised — interactions, transitions, anomalies.
- Per-device bandwidth is constrained by the deployment environment — retailer store networks, cellular links, battery.
- Downstream labelling cost is proportional to captured volume; cutting volume 100× is worth the engineering effort of a good trigger.
Mechanism¶
- Define the trigger as a composite of on-device signals. Use cheap detectors that already run on the device. Instacart Capsight Collector uses activity signal (e.g. hand motion) AND recognised barcode. Two signals combined gives "high confidence that a meaningful interaction is occurring".
- Capture a time-bounded window around the trigger. Not a single frame — a pre-event + post-event buffer to give training data the full context.
- Persist captured clips locally first — don't upload in-line; the uploader is a separate stage with its own policies (see patterns/resilient-edge-uploader).
- Plan for trigger evolution. Triggers are a moving target; early versions catch easy cases, later versions add signals for harder cases. Capsight's authors explicitly flag "more signals being developed".
Trade-off: trigger sensitivity¶
The post calls this out directly:
The sensitivity of this trigger is an important trade-off. Collecting useless data is expensive and increases noise, but missing signals decreases training input.
Trigger tuning is a curve:
- Too permissive → back to streaming-everything; bandwidth, storage, and labelling costs balloon.
- Too restrictive → systematic holes in the training data (the exact failure modes the flywheel is supposed to fix never get captured because the trigger doesn't fire on them).
Typical practice: instrument the trigger itself (count fires / misses / false-positives) and compare to labelled ground-truth samples periodically. Treat trigger definition as a first-class model artefact, versioned and evaluated.
Related concepts¶
- concepts/edge-cloud-data-flywheel — trigger-based capture is the on-device entry point.
- concepts/production-data-diversity — the reason to capture at all.
- concepts/edge-filtering — the broader principle of dropping irrelevant traffic at the edge.
Relationship to other patterns¶
- patterns/data-driven-annotation-curation is the cloud-side counterpart — filter + curate what the trigger lets through. Both belong in a mature flywheel.
- patterns/resilient-edge-uploader runs after trigger capture decides a clip is worth keeping.
- patterns/distributed-fleet-as-data-pipeline is the framing pattern; trigger-based capture is the mechanism that makes it tractable.
Seen in¶
- systems/capsight Collector — activity signal + recognised barcode as the initial trigger for Caper smart carts. (Source: sources/2026-02-17-instacart-turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight)