PATTERN Cited by 1 source

Nested Elasticsearch for multimodal query¶

Problem¶

Index a collection of per-record heterogeneous multi-annotation payloads (different modalities, different schemas) into Elasticsearch such that:

Each annotation can be queried on its own fields.
Cross-annotation constraints within a single parent record ("same bucket has both X and Y") match correctly.
The index stays idempotent across producer re-runs.

A flat document per annotation loses cross-annotation-within- parent semantics. A separate-index-per-annotation-type parent-child model is flexible but query-expensive at scale.

Solution¶

Model each parent record as an Elasticsearch nested document whose source_annotations field (typed nested) carries an array of independently-indexed child annotation documents. Nested-typed fields preserve the "within-same-child" semantics at query time via the nested query shape.

Canonical instance¶

Netflix's multimodal video-search index (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search), conceptually structured as:

Root document
├── associated_ids (MOVIE_ID, ASSET_ID)
├── time_bucket_start_ns
├── time_bucket_end_ns
└── source_annotations: [nested]
    ├── CHARACTER_SEARCH child { label: "Joey", ... }
    └── SCENE_SEARCH child    { label: "kitchen", embedding_vector: [...], ... }

Document _id is the composite (asset_id, time_bucket) — enabling composite-key upsert on producer re-runs. Root carries the bucket identity; children carry the per-modality payloads.

Why nested over alternatives¶

Shape	Cross-annotation query	Write flexibility	Cost
Flat (one annotation per doc)	❌ matches across parent	Trivial	Lowest
Parent-child (separate index for children)	✅ via join	Partial update	Highest query cost
Nested (this pattern)	✅ via `nested` query	Full-parent reindex on child change	Middle
Denormalized (all annotations flattened into root fields)	✅ but duplicates field names	Hard with heterogeneous shapes	N/A for heterogeneous

Netflix's explicit framing: "The root level captures the overarching asset context, while associated child documents house the specific, multi-modal annotation data. This hierarchical data model is precisely what empowers users to execute highly efficient, cross-annotation queries at scale."

Query-shape example¶

"buckets where a character with label Joey co-occurs with a scene annotation with label kitchen":

nested {
  path: source_annotations,
  query: bool { must: [
    term { source_annotations.annotation_type: "CHARACTER_SEARCH" },
    term { source_annotations.label: "Joey" }
  ]}
}
AND
nested {
  path: source_annotations,
  query: bool { must: [
    term { source_annotations.annotation_type: "SCENE_SEARCH" },
    term { source_annotations.label: "kitchen" }
  ]}
}

Both nested clauses must match the same parent document — yielding precisely the buckets where both modalities co-occurred.

Sibling patterns¶

patterns/three-stage-ingest-fusion-index — the larger pipeline this indexing step is the terminal stage of.
patterns/temporal-bucketed-intersection — the algorithm in stage 2 that produces the nested documents.

Caveats¶

Update cost — any change to a nested child requires reindexing the whole parent document. Model re-runs trigger full parent rewrites via composite-key upsert; this is acceptable but not free.
Nested-query cost scales with parent-child cardinality; Netflix doesn't disclose bucket fan-out (annotations per bucket), shard-size targets, or query p99.
Embedding vectors living on nested children complicate vector-search — Elasticsearch's k-NN on nested vectors is a relatively young feature; the Netflix post doesn't describe how/if they query by vector similarity inside the nested shape vs separately.
Aggregations across nested children require explicit nested aggregation; inattention to this silently produces parent-level instead of child-level aggregates.