Skip to content

PATTERN Cited by 1 source

Temporal bucketed intersection

Problem

Compute the intersection of continuous-time annotations from independent producers (multiple ML models, multiple modalities) in a way that is efficient to ingest, store, and query at catalog scale.

Naïve interval arithmetic is O(N²) per asset — every annotation pair checked for overlap. With dozens of modalities × hundreds of detections per asset × catalog-scale asset counts, this is not tractable online.

Solution

Three-step bucket-then-intersect algorithm:

  1. Bucket Mapping — segment every continuous annotation [t_start, t_end] into a set of fixed-size time buckets (e.g. 1-second). A [2s, 8s] annotation expands to buckets [2-3, 3-4, 4-5, 5-6, 6-7, 7-8]. See concepts/temporal-bucket-discretization.
  2. Annotation Intersection — for each bucket, collect all annotations (across all modalities) that include it. Bucket is the join key; intersection is key-equality set-union.
  3. Optimised Persistence — write a single enriched record per (asset_id, time_bucket) containing all co-occurring annotations. The record is the output; the asset + bucket tuple is the natural composite key. See concepts/composite-key-upsert.

Canonical instance

Netflix's multimodal video-search pipeline (sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search), worked example:

  • Character annotation: "Joey" from seconds 2 through 8 → 6 one-second buckets [2-3 .. 7-8].
  • Scene annotation: "kitchen" from seconds 4 through 9 → 5 one-second buckets [4-5 .. 8-9].
  • Intersection (shared buckets): [4-5, 5-6, 6-7, 7-8] — the four seconds where both are simultaneously true.
  • Output: 4 enriched records, one per shared bucket, each carrying both the CHARACTER_SEARCH and SCENE_SEARCH child annotations.

Why it works

  • Complexity collapses from O(N²) to O(N × B) where B is the bucket count per annotation. For bounded-length annotations, B is small and constant.
  • Bucket-keyed storage is naturally shardable — partition by (asset_id, bucket) and parallelism falls out.
  • Multi-producer friendly — new modalities just contribute more entries per bucket without changing the algorithm.
  • Model re-runs are trivially idempotent under composite-key upsert on (asset_id, bucket).

Granularity trade-off

  • Bucket too coarse → loses timing precision; annotations shorter than the bucket merge with neighbours.
  • Bucket too fine → amplifies storage cardinality and index size for no expressive gain.

Netflix uses one-second buckets in the worked example; the production bucket size isn't disclosed in the post.

Sibling pattern

patterns/sliding-window-rollup-aggregation — Netflix's Distributed Counter. Same fixed-window discretization applied to events (counter increments) instead of interval annotations; same bucket-as-idempotent-aggregate record shape.

Caveats

  • Bucket size is a global design choice; non-uniform bucket sizes across different modality pairs aren't expressible in this scheme.
  • Intersections that require sub-bucket resolution (e.g. "character present at least 0.5 s within bucket") aren't preserved — the original time_range on the child annotation record is, but the intersection "record" doesn't compute it.
  • Bucket-level records lose native interval-algebra expressive power; queries for "how long did the intersection last?" need downstream aggregation across contiguous buckets.
Last updated · 319 distilled / 1,201 read