PATTERN Cited by 1 source
Synthetic data generation¶
Use a controllable generative model (diffusion / image-to-image / grounded text-to-image) to produce training data with ground-truth annotations automatically embedded, at a scale + diversity that manual collection cannot achieve. Especially valuable for rare-event classes (floor spills, specific hazard types) + diversity gaps (PPE color variation, lighting conditions, occlusion patterns).
When it pays¶
Two forcing functions in the canonical wiki instance:
- Rare-event class imbalance. "A prime example is floor spill detection: despite examining and annotating over half a million images, only a few hundred examples of liquid spills or debris on walkways were identified." No amount of real-image collection fixes this at a reasonable cost — the event is genuinely rare.
- Within-class diversity. "PPE items appear in a single dominant color. However, workplace policies often permit variations, and workers occasionally wear acceptable PPE in different colors. Without sufficient training examples across color variations, the model risks failing to detect non-standard colored items, creating potential safety blind spots."
Shape¶
- Grounded controllable generator. The model accepts structured inputs — in the canonical case, bounding-box coordinates + class labels — and generates an image where the specified objects appear at the specified positions. GLIGEN (Grounded Language-to-Image Generation) is the named realisation.
- Batch runtime. Deploy the generator as a SageMaker Batch Transform job (or equivalent batch-inference substrate); each input record is one structured specification, each output is one photorealistic image (512×512 in the canonical instance).
- Auto-embedded ground truth. Because the generator was told where each object goes, the bounding-box annotations are known by construction — converted to YOLO annotation format (or equivalent) by parallel Python workers downstream of the generator + uploaded back to S3 as training-ready datasets.
- Training configuration. YOLOv8 / similar on SageMaker AI with PyTorch 2.1, cosine learning-rate scheduling + AdamW optimisation (named as "critical for stabilizing the larger YOLOv8l model variant and preventing gradient divergence during training").
Results (canonical wiki instance)¶
Without a single manually-annotated real image: - PPE (3 classes): 99.5% mAP@50 / 100% precision / 100% recall on a 75,000-image GLIGEN-generated dataset (person + hard hat + safety vest). - Housekeeping (7 classes): 94.3% mAP@50 / 91.4% precision / 86.9% recall on a 75,000-image GLIGEN-generated dataset (pallet jack / go-cart / step ladder / trash can / safety cone / tote / pallet).
Author notes accuracy "can be further improved by increasing the volume of training images used to build and train the custom model."
Why it works¶
- Label cost goes to zero for the labelling step itself — the generator knows the ground truth because it placed the objects. The cost shifts to prompting + generation compute + quality review of the synthetic dataset.
- Rare events become abundant — the model can be prompted to produce a violation condition at whatever frequency the training curriculum needs.
- Diversity becomes controllable — PPE colors, lighting, occlusion, floor types all become prompt parameters rather than sampling-luck properties.
- Compounds with patterns/data-driven-annotation-curation — once LLM analysis of misclassified samples surfaces underrepresented classes, targeted synthetic generation closes the gap without sending more annotators into the field.
Caveats¶
- Real-image annotation remains important for site-specific conditions (camera angles, lighting, equipment layouts) that the generator wasn't conditioned on. Synthetic + targeted-real is the production mix, not synthetic-only long-term.
- The generator's own distribution biases propagate into the trained model — a bad generator makes a bad dataset.
- Domain-shift risk: training purely on synthetic means the evaluation set better reflect real deployment conditions, or headline mAP numbers overstate production quality.
Seen in¶
- sources/2026-04-01-aws-automate-safety-monitoring-with-computer-vision-and-generative-ai — canonical wiki instance. GLIGEN deployed on SageMaker Batch Transform; structured bounding-box inputs drive photorealistic 512×512 facility scenes with ground-truth annotations auto-embedded; converted to YOLO annotation format; trained YOLOv8 reached 99.5% mAP@50 on PPE + 94.3% on Housekeeping without any manually-annotated real images.
Related¶
- patterns/data-driven-annotation-curation — identifies where synthetic data is needed (class imbalance + error hotspots).
- systems/gligen — the diffusion-based generator in the canonical instance.
- systems/aws-sagemaker-batch-transform — the batch runtime.