CONCEPT Cited by 1 source

Edge-to-cloud data flywheel¶

Definition¶

An edge-to-cloud data flywheel is an ML-platform pattern where a deployed fleet of edge devices continuously feeds a cloud data-management + training platform, which continuously deploys improved model weights back to the fleet — forming a closed loop of the canonical shape:

Collect → Manage → Label → Train → Deploy → Collect …

Each loop iteration improves the next model's training data, so the fleet's models monotonically improve as long as the loop spins. The flywheel is rotating when the end-to-end collect-to-release latency is short enough that each loop closes on the same fleet in a useful time window (Instacart's Capsight: week-scale).

Why it matters¶

Three production-ML pain points motivate it:

Training data does not reflect reality. Manually collected or purchased datasets miss the long tail of production conditions — lighting, occlusion, damaged packaging, deployment-specific SKUs, motion blur, uncommon angles. See concepts/production-data-diversity.
Observability gap. Without a feedback substrate, engineers can't reproduce "what the device experienced when it misbehaved". This blocks both incident response and model improvement.
Iteration cost grows linearly with fleet size — by default. More devices = more data = more human-labelling hours = slower / more expensive iteration. The flywheel's explicit design goal is to decouple iteration cost from fleet size, typically via automated filtering + AI-assisted labelling (see [[patterns/vlm-assisted-pre- labeling]]).

Structural shape¶

All working instances share:

Trigger-based edge capture — the edge agent does not exfiltrate everything; it captures when a meaningful event signal fires. Keeps cloud bandwidth + storage bounded and keeps training data signal-dense. See patterns/trigger-based-edge-capture.
A cloud data platform with search + replay — ingested data is indexed + visualisable + filterable so that engineers can pick training-worthy slices by metadata, and so that anything a device experienced can be reproduced.
Automated annotation — typically VLM / LLM-based pre-labels corrected by humans, or low-confidence-only human review (see patterns/vlm-assisted-pre-labeling, patterns/low-confidence-to-human-review, patterns/human-calibrated-llm-labeling); blanket manual annotation is always the iteration-cost bottleneck.
A distributed training platform — Ray, Kubeflow, SageMaker, etc. — wired to consume the curated dataset and emit validated model candidates.
An automated evaluation gate against standardised test sets, to prevent regressions from shipping to the fleet.
A continuous deployment path back to the edge — OTA updates, feature-flagged rollouts, canary subsets of the fleet.

Compared to adjacent concepts¶

concepts/continuous-reprediction is the serving-time sibling: continuously re-score a VM's remaining lifetime as signals evolve. The edge-to-cloud flywheel is the training-time sibling: continuously re-train as the real input distribution evolves.
concepts/training-serving-boundary formalises the split between where models learn and where they run. The flywheel operationalises a feedback loop across that boundary — serving produces new training data.
patterns/prompt-optimizer-flywheel applies the same closed-loop logic to prompts rather than weights.

Operational discipline¶

A flywheel is spinning only if:

End-to-end latency is short enough to matter. If Collect → Deploy takes longer than the model's deployed lifetime, the loop is broken.
The labelling throughput scales with data throughput — usually requires AI-assisted annotation.
The data-collection cost is bounded — per-cart / per-device, not per-event-logged.
Training + evaluation + rollout are automated end-to-end; any human hand-off gates iteration cadence.

Capsight's stated numbers (month → week end-to-end; week → two days for training alone; >70% annotation cost reduction) are what "spinning" looks like in practice.

Seen in¶

systems/capsight — Caper smart-cart fleet feeds CV model iteration. First instance on this wiki with stated iteration-cadence numbers. (Source: sources/2026-02-17-instacart-turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight)