Skip to content

CONCEPT Cited by 1 source

Temporal bucket discretization

Definition

Temporal bucket discretization is the technique of segmenting a continuous time range into a contiguous sequence of fixed-size discrete intervals so that independently-produced events, detections or annotations over overlapping but distinct continuous ranges can be compared, intersected, or joined at a common granularity.

Canonical wiki instance: Netflix's multimodal video-search pipeline maps per-model detections (e.g. "character 'Joey' from second 2 through 8") into seven distinct one-second buckets (Source: sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search).

Why discretize

Continuous time ranges are hard to intersect at query time:

  • Per-model detections are native continuous [t_start, t_end] intervals. A scene-detection model might emit "kitchen from 4.0 s to 9.0 s"; a character model "Joey from 2.0 s to 8.0 s".
  • Computing "is there a second where both were true?" requires an interval-overlap test on every pair — O(M × N) for M character annotations and N scene annotations on one asset.
  • At Netflix catalog scale (thousands of titles × hundreds of shots × tens of models producing annotations) interval-overlap joins are not tractable online.

Discretizing to fixed-size buckets collapses the interval-overlap problem to bucket-key equality:

  • The "Joey 2-8 s" annotation expands to buckets [2-3, 3-4, 4-5, 5-6, 6-7, 7-8].
  • The "kitchen 4-9 s" annotation expands to [4-5, 5-6, 6-7, 7-8, 8-9].
  • Intersection is the set [4-5, 5-6, 6-7, 7-8] — found by key-equality on the bucket identifier, not interval arithmetic.

The trade-off is bucket-granularity precision: a bucket coarser than the shortest detection loses timing resolution; a bucket finer than needed amplifies storage + index cardinality for no gain.

Bucket identity as composite key

In Netflix's pipeline each bucket's identity is (asset_id, time_bucket_start, time_bucket_end) — which feeds the downstream composite-key upsert into Elasticsearch and keeps the fusion pipeline idempotent across model re-runs.

Seen in

Caveats

  • The one-second bucket size is used in Netflix's worked example; the production bucket size is not explicitly disclosed.
  • Bucket discretization trades resolution for tractability — detections shorter than the bucket get merged with overlapping detections in the same bucket.
  • Embeddings attached to continuous-interval annotations present a design choice when discretized: keep the original interval on the child record (Netflix's approach in the intersection-record example) vs re-computing per-bucket embeddings.
Last updated · 319 distilled / 1,201 read