PATTERN Cited by 1 source
Nested Elasticsearch for multimodal query¶
Problem¶
Index a collection of per-record heterogeneous multi-annotation payloads (different modalities, different schemas) into Elasticsearch such that:
- Each annotation can be queried on its own fields.
- Cross-annotation constraints within a single parent record ("same bucket has both X and Y") match correctly.
- The index stays idempotent across producer re-runs.
A flat document per annotation loses cross-annotation-within- parent semantics. A separate-index-per-annotation-type parent-child model is flexible but query-expensive at scale.
Solution¶
Model each parent record as an Elasticsearch nested document
whose source_annotations field (typed nested) carries an
array of independently-indexed child annotation documents.
Nested-typed fields preserve the "within-same-child" semantics
at query time via the nested query shape.
Canonical instance¶
Netflix's multimodal video-search index (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search), conceptually structured as:
Root document
├── associated_ids (MOVIE_ID, ASSET_ID)
├── time_bucket_start_ns
├── time_bucket_end_ns
└── source_annotations: [nested]
├── CHARACTER_SEARCH child { label: "Joey", ... }
└── SCENE_SEARCH child { label: "kitchen", embedding_vector: [...], ... }
Document _id is the composite (asset_id, time_bucket) —
enabling composite-key upsert
on producer re-runs. Root carries the bucket identity;
children carry the per-modality payloads.
Why nested over alternatives¶
| Shape | Cross-annotation query | Write flexibility | Cost |
|---|---|---|---|
| Flat (one annotation per doc) | ❌ matches across parent | Trivial | Lowest |
| Parent-child (separate index for children) | ✅ via join | Partial update | Highest query cost |
| Nested (this pattern) | ✅ via nested query |
Full-parent reindex on child change | Middle |
| Denormalized (all annotations flattened into root fields) | ✅ but duplicates field names | Hard with heterogeneous shapes | N/A for heterogeneous |
Netflix's explicit framing: "The root level captures the overarching asset context, while associated child documents house the specific, multi-modal annotation data. This hierarchical data model is precisely what empowers users to execute highly efficient, cross-annotation queries at scale."
Query-shape example¶
"buckets where a character with label Joey co-occurs with a scene annotation with label kitchen":
nested {
path: source_annotations,
query: bool { must: [
term { source_annotations.annotation_type: "CHARACTER_SEARCH" },
term { source_annotations.label: "Joey" }
]}
}
AND
nested {
path: source_annotations,
query: bool { must: [
term { source_annotations.annotation_type: "SCENE_SEARCH" },
term { source_annotations.label: "kitchen" }
]}
}
Both nested clauses must match the same parent document —
yielding precisely the buckets where both modalities
co-occurred.
Sibling patterns¶
- patterns/three-stage-ingest-fusion-index — the larger pipeline this indexing step is the terminal stage of.
- patterns/temporal-bucketed-intersection — the algorithm in stage 2 that produces the nested documents.
Caveats¶
- Update cost — any change to a nested child requires reindexing the whole parent document. Model re-runs trigger full parent rewrites via composite-key upsert; this is acceptable but not free.
- Nested-query cost scales with parent-child cardinality; Netflix doesn't disclose bucket fan-out (annotations per bucket), shard-size targets, or query p99.
- Embedding vectors living on nested children complicate vector-search — Elasticsearch's k-NN on nested vectors is a relatively young feature; the Netflix post doesn't describe how/if they query by vector similarity inside the nested shape vs separately.
- Aggregations across nested children require explicit
nestedaggregation; inattention to this silently produces parent-level instead of child-level aggregates.