Skip to content

PATTERN Cited by 2 sources

Data-driven annotation curation

Replace blanket per-site daily annotation with intelligent, performance-driven curation that directs human labelling effort only where it would most improve the model. Named step-change when blanket sampling scales past the annotation team's capacity.

Shape

Three signals composed into the curation pipeline:

  1. False-positive-rate aggregation across conditions. Query inference results + customer feedback via Amazon Athena over S3-backed logs; bucket by camera type + deployment conditions + other dimensions; prioritise retraining on image sources with elevated error rates.
  2. Low-confidence sampling. Surface inferences where model confidence scores fell below established thresholds; these uncertain predictions are flagged for targeted annotation — directs human time toward cases near the decision boundary, which teach the model most per-label.
  3. Multi-modal LLM analysis of misclassified samples. Use Claude (or similar multi-modal LLM) on Amazon Bedrock to analyse misclassified examples + detect underrepresented object classes in the existing training distribution. Output: a class-imbalance map that directly informs the next data-collection / synthetic-data priorities.

Output feeds a SageMaker Ground Truth labelling job-generation workflow that now creates targeted jobs rather than blanket one-job-per-site-per-day.

Why it works

  • Sustainability: the labelling team's capacity stops being the scaling constraint; per-site growth doesn't linearly grow annotation headcount.
  • Training efficiency: labels at the decision boundary + on underperforming segments + on underrepresented classes carry more gradient per label than a random site sample.
  • Compounding with synthetic data: the class-imbalance map from LLM analysis directly informs synthetic-data generation priorities (patterns/synthetic-data-generation); rare classes get synthetic augmentation, annotation budget goes to real-world edge cases.

When to apply

  • Inference results + customer feedback signals exist at scale (FP marking, missed-detection reports).
  • Model is already serving traffic, so confidence scores + labels are joinable.
  • Annotation team has become the bottleneck on model-improvement velocity.

Seen in

Last updated · 542 distilled / 1,571 read