PATTERN Cited by 1 source
Time-window aggregator for temporal graph¶
Time-window aggregator for temporal graph is the pattern of storing time-windowed accumulated aggregates of a continuously- mutating graph instead of per-time-slice snapshots, so that historical point-in-time views can be reconstructed at query time without paying linear-in-time storage cost.
The wiki's canonical instance is Netflix's Service Topology, which uses this pattern to make time-travel topology query affordable on a graph of "thousands of microservices" updated "as services deploy multiple times per day."
The cost problem this solves¶
The naive approach to historical graph queries — "snapshot the whole graph every minute and keep every snapshot" — has cost:
For a Netflix-scale topology graph at minute granularity over a year of retention:
Even with compression, this is prohibitively large, and most of it is redundant — the vast majority of edges are stable across consecutive windows.
The Netflix post calls out this exact cost framing:
"This time-travel capability is powered by time-window aggregation — instead of storing every time slice separately, we use layer-specific aggregators that accumulate topology data across windows, allowing us to reconstruct historical views efficiently without exploding storage costs." (sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map)
Pattern shape¶
Real-time graph mutations
│
▼
┌──────────────────────────────────────────────┐
│ Per-window aggregator │
│ - For each time window W: │
│ - Accumulate edge presence across W │
│ - Accumulate per-edge aggregates │
│ (count, error rate, latency, …) │
│ - Record edge-state changes │
└──────────────────────────────────────────────┘
│
▼
Window-keyed store (one record per window per (edge, layer))
│
▼
Historical view reconstruction:
Given target time T, fetch the windows covering T,
apply accumulated state forward from a baseline,
return the reconstructed graph at T.
The key insight: accumulated state per window + baseline + deltas = much cheaper than full snapshots, while still supporting point-in-time reconstruction.
When to apply¶
- The graph being archived is highly correlated across time — most edges persist for many windows; only a small fraction change per window.
- Approximate point-in-time accuracy is acceptable — the reconstructed view at time T is accurate to within window granularity W, not microsecond-precise.
- The query workload is historical inspection, not high-frequency time-series rollback (e.g. "what did the graph look like 6 hours before the incident?" rather than "give me every state every 10 ms").
When NOT to apply¶
- The graph mutates so rapidly that most edges change per window — there's no compression benefit over per-slice snapshots.
- Sub-window precision is required (e.g. forensic analysis at millisecond granularity).
- Per-edge auditing (every state change must be recoverable exactly, not approximately).
Mechanics¶
Window granularity choice¶
The window granularity W is the resolution of historical queries and a fundamental trade-off:
- Shorter W → finer historical resolution, more storage.
- Longer W → coarser historical resolution, less storage.
Typical choices: 1 minute / 5 minutes / 1 hour, with multi-tier windows (recent at 1-min granularity, older at 1-hour) common.
Netflix doesn't disclose the window granularity in this post.
Per-window accumulation¶
For each window W, the aggregator accumulates:
- Edge presence — did this edge appear in W?
- Per-edge aggregates — flow count, byte volume, error rate, latency distribution, etc.
- Edge-state changes (delta from prior window) — new edges, disappeared edges, edges with significant property change.
Layer-specific aggregators¶
Netflix's framing names this explicitly: "layer-specific aggregators that accumulate topology data across windows." Each layer (network, IPC, tracing) has its own aggregator tuned for its data shape:
- Network layer (eBPF) — likely accumulates edge presence + flow volume per window.
- IPC layer — likely accumulates per-edge endpoint set, error rate, latency distribution per window.
- Tracing layer — already columnar/analytical; native time-window aggregation fits the substrate naturally.
The per-layer choice mirrors the per-layer storage choice: each layer's aggregator targets its own access pattern.
Reconstruction at query time¶
For a query "give me the topology at time T":
- Find the window W containing T.
- Fetch W's accumulated state.
- (Optional) Combine with a baseline + N preceding window deltas if reconstruction needs to be more precise than W-granularity.
- Return the reconstructed graph.
The reconstruction cost is O(graph size at T) + O(window deltas) — much cheaper than scanning per-slice snapshots over the retention horizon.
Generalisation beyond service topology¶
The pattern applies to any highly-correlated time-evolving graph where historical inspection is needed:
- Social graphs — friendship edges change slowly relative to message events.
- Permission graphs — IAM permissions evolve much more slowly than the requests that flow through them.
- Knowledge graphs — facts in a knowledge graph evolve slowly relative to queries.
- Asset-inventory graphs — cloud assets persist for hours/days in a typical fleet.
The shared property: the graph state at time T is more efficiently encoded as a baseline + N deltas than as a snapshot.
Sibling patterns¶
- patterns/sliding-window-rollup-aggregation — sibling at the rollup-window level; per-time-bucket aggregation rather than per-graph-edge aggregation.
- patterns/lazy-aggregate-from-monotonic-local-state — sibling at the read-side reconstruction level.
- patterns/intermediate-result-snapshot-for-resume — sibling at the checkpoint-and-replay level (different intent: resume a computation rather than time-travel a graph).
Seen in¶
- sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map — canonical wiki source. Time-travel topology query is named as a first-class capability of Service Topology, with the time- window-aggregation technique sketched at one paragraph's depth.
Related¶
- systems/netflix-service-topology — canonical instance
- concepts/temporal-topology-query — concept-level framing
- concepts/service-dependency-graph — the artifact whose history is being queried
- companies/netflix