Skip to content

SYSTEM Cited by 1 source

GLIGEN (Grounded Language-to-Image Generation)

GLIGEN (Grounded Language-to-Image Generation) is a diffusion-based generative model that extends conventional text-to-image generation with structured spatial grounding: alongside a natural-language prompt, callers provide bounding-box coordinates + per-box class labels, and GLIGEN produces a photorealistic image where the specified objects appear at the specified positions.

This enables auto-annotated training-data generation: because the caller specified where each object goes, the ground-truth bounding boxes are known by construction — no human labelling step is needed to produce a training-ready dataset.

Stub page — expand as more GLIGEN-internals sources land on the wiki.

Seen in

Last updated · 200 distilled / 1,178 read