CONCEPT Cited by 1 source
Temporal bucket discretization¶
Definition¶
Temporal bucket discretization is the technique of segmenting a continuous time range into a contiguous sequence of fixed-size discrete intervals so that independently-produced events, detections or annotations over overlapping but distinct continuous ranges can be compared, intersected, or joined at a common granularity.
Canonical wiki instance: Netflix's multimodal video-search pipeline maps per-model detections (e.g. "character 'Joey' from second 2 through 8") into seven distinct one-second buckets (Source: sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search).
Why discretize¶
Continuous time ranges are hard to intersect at query time:
- Per-model detections are native continuous
[t_start, t_end]intervals. A scene-detection model might emit "kitchen from 4.0 s to 9.0 s"; a character model "Joey from 2.0 s to 8.0 s". - Computing "is there a second where both were true?" requires
an interval-overlap test on every pair —
O(M × N)for M character annotations and N scene annotations on one asset. - At Netflix catalog scale (thousands of titles × hundreds of shots × tens of models producing annotations) interval-overlap joins are not tractable online.
Discretizing to fixed-size buckets collapses the interval-overlap problem to bucket-key equality:
- The "Joey 2-8 s" annotation expands to buckets
[2-3, 3-4, 4-5, 5-6, 6-7, 7-8]. - The "kitchen 4-9 s" annotation expands to
[4-5, 5-6, 6-7, 7-8, 8-9]. - Intersection is the set
[4-5, 5-6, 6-7, 7-8]— found by key-equality on the bucket identifier, not interval arithmetic.
The trade-off is bucket-granularity precision: a bucket coarser than the shortest detection loses timing resolution; a bucket finer than needed amplifies storage + index cardinality for no gain.
Bucket identity as composite key¶
In Netflix's pipeline each bucket's identity is
(asset_id, time_bucket_start, time_bucket_end) — which feeds
the downstream composite-key
upsert into Elasticsearch and keeps
the fusion pipeline idempotent across model re-runs.
Seen in¶
- sources/2026-04-04-netflix-powering-multimodal-intelligence-for-video-search — canonical wiki instance. Netflix uses one-second buckets for multimodal video-search ingestion; discretized buckets are the unit of cross-model intersection, of composite-key upsert into Elasticsearch, and of the root-asset + child-annotation nested document shape.
Caveats¶
- The one-second bucket size is used in Netflix's worked example; the production bucket size is not explicitly disclosed.
- Bucket discretization trades resolution for tractability — detections shorter than the bucket get merged with overlapping detections in the same bucket.
- Embeddings attached to continuous-interval annotations present a design choice when discretized: keep the original interval on the child record (Netflix's approach in the intersection-record example) vs re-computing per-bucket embeddings.