Skip to content

PATTERN Cited by 1 source

Nested Elasticsearch for multimodal query

Problem

Index a collection of per-record heterogeneous multi-annotation payloads (different modalities, different schemas) into Elasticsearch such that:

  • Each annotation can be queried on its own fields.
  • Cross-annotation constraints within a single parent record ("same bucket has both X and Y") match correctly.
  • The index stays idempotent across producer re-runs.

A flat document per annotation loses cross-annotation-within- parent semantics. A separate-index-per-annotation-type parent-child model is flexible but query-expensive at scale.

Solution

Model each parent record as an Elasticsearch nested document whose source_annotations field (typed nested) carries an array of independently-indexed child annotation documents. Nested-typed fields preserve the "within-same-child" semantics at query time via the nested query shape.

Canonical instance

Netflix's multimodal video-search index (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search), conceptually structured as:

Root document
├── associated_ids (MOVIE_ID, ASSET_ID)
├── time_bucket_start_ns
├── time_bucket_end_ns
└── source_annotations: [nested]
    ├── CHARACTER_SEARCH child { label: "Joey", ... }
    └── SCENE_SEARCH child    { label: "kitchen", embedding_vector: [...], ... }

Document _id is the composite (asset_id, time_bucket) — enabling composite-key upsert on producer re-runs. Root carries the bucket identity; children carry the per-modality payloads.

Why nested over alternatives

Shape Cross-annotation query Write flexibility Cost
Flat (one annotation per doc) ❌ matches across parent Trivial Lowest
Parent-child (separate index for children) ✅ via join Partial update Highest query cost
Nested (this pattern) ✅ via nested query Full-parent reindex on child change Middle
Denormalized (all annotations flattened into root fields) ✅ but duplicates field names Hard with heterogeneous shapes N/A for heterogeneous

Netflix's explicit framing: "The root level captures the overarching asset context, while associated child documents house the specific, multi-modal annotation data. This hierarchical data model is precisely what empowers users to execute highly efficient, cross-annotation queries at scale."

Query-shape example

"buckets where a character with label Joey co-occurs with a scene annotation with label kitchen":

nested {
  path: source_annotations,
  query: bool { must: [
    term { source_annotations.annotation_type: "CHARACTER_SEARCH" },
    term { source_annotations.label: "Joey" }
  ]}
}
AND
nested {
  path: source_annotations,
  query: bool { must: [
    term { source_annotations.annotation_type: "SCENE_SEARCH" },
    term { source_annotations.label: "kitchen" }
  ]}
}

Both nested clauses must match the same parent document — yielding precisely the buckets where both modalities co-occurred.

Sibling patterns

Caveats

  • Update cost — any change to a nested child requires reindexing the whole parent document. Model re-runs trigger full parent rewrites via composite-key upsert; this is acceptable but not free.
  • Nested-query cost scales with parent-child cardinality; Netflix doesn't disclose bucket fan-out (annotations per bucket), shard-size targets, or query p99.
  • Embedding vectors living on nested children complicate vector-search — Elasticsearch's k-NN on nested vectors is a relatively young feature; the Netflix post doesn't describe how/if they query by vector similarity inside the nested shape vs separately.
  • Aggregations across nested children require explicit nested aggregation; inattention to this silently produces parent-level instead of child-level aggregates.
Last updated · 319 distilled / 1,201 read