PATTERN Cited by 1 source

Time-window aggregator for temporal graph¶

Time-window aggregator for temporal graph is the pattern of storing time-windowed accumulated aggregates of a continuously- mutating graph instead of per-time-slice snapshots, so that historical point-in-time views can be reconstructed at query time without paying linear-in-time storage cost.

The wiki's canonical instance is Netflix's Service Topology, which uses this pattern to make time-travel topology query affordable on a graph of "thousands of microservices" updated "as services deploy multiple times per day."

The cost problem this solves¶

The naive approach to historical graph queries — "snapshot the whole graph every minute and keep every snapshot" — has cost:

Storage = (graph size) × (slice frequency) × (retention horizon)

For a Netflix-scale topology graph at minute granularity over a year of retention:

~10⁵ edges × 525,600 minutes ≈ 5×10¹⁰ edge-snapshots

Even with compression, this is prohibitively large, and most of it is redundant — the vast majority of edges are stable across consecutive windows.

The Netflix post calls out this exact cost framing:

"This time-travel capability is powered by time-window aggregation — instead of storing every time slice separately, we use layer-specific aggregators that accumulate topology data across windows, allowing us to reconstruct historical views efficiently without exploding storage costs." (sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map)

Pattern shape¶

Real-time graph mutations
  │
  ▼
┌──────────────────────────────────────────────┐
│  Per-window aggregator                       │
│  - For each time window W:                   │
│    - Accumulate edge presence across W       │
│    - Accumulate per-edge aggregates          │
│      (count, error rate, latency, …)         │
│    - Record edge-state changes               │
└──────────────────────────────────────────────┘
  │
  ▼
Window-keyed store (one record per window per (edge, layer))
  │
  ▼
Historical view reconstruction:
  Given target time T, fetch the windows covering T,
  apply accumulated state forward from a baseline,
  return the reconstructed graph at T.

The key insight: accumulated state per window + baseline + deltas = much cheaper than full snapshots, while still supporting point-in-time reconstruction.

When to apply¶

The graph being archived is highly correlated across time — most edges persist for many windows; only a small fraction change per window.
Approximate point-in-time accuracy is acceptable — the reconstructed view at time T is accurate to within window granularity W, not microsecond-precise.
The query workload is historical inspection, not high-frequency time-series rollback (e.g. "what did the graph look like 6 hours before the incident?" rather than "give me every state every 10 ms").

When NOT to apply¶

The graph mutates so rapidly that most edges change per window — there's no compression benefit over per-slice snapshots.
Sub-window precision is required (e.g. forensic analysis at millisecond granularity).
Per-edge auditing (every state change must be recoverable exactly, not approximately).

Mechanics¶

Window granularity choice¶

The window granularity W is the resolution of historical queries and a fundamental trade-off:

Shorter W → finer historical resolution, more storage.
Longer W → coarser historical resolution, less storage.

Typical choices: 1 minute / 5 minutes / 1 hour, with multi-tier windows (recent at 1-min granularity, older at 1-hour) common.

Netflix doesn't disclose the window granularity in this post.

Per-window accumulation¶

For each window W, the aggregator accumulates:

Edge presence — did this edge appear in W?
Per-edge aggregates — flow count, byte volume, error rate, latency distribution, etc.
Edge-state changes (delta from prior window) — new edges, disappeared edges, edges with significant property change.

Layer-specific aggregators¶

Netflix's framing names this explicitly: "layer-specific aggregators that accumulate topology data across windows." Each layer (network, IPC, tracing) has its own aggregator tuned for its data shape:

Network layer (eBPF) — likely accumulates edge presence + flow volume per window.
IPC layer — likely accumulates per-edge endpoint set, error rate, latency distribution per window.
Tracing layer — already columnar/analytical; native time-window aggregation fits the substrate naturally.

The per-layer choice mirrors the per-layer storage choice: each layer's aggregator targets its own access pattern.

Reconstruction at query time¶

For a query "give me the topology at time T":

Find the window W containing T.
Fetch W's accumulated state.
(Optional) Combine with a baseline + N preceding window deltas if reconstruction needs to be more precise than W-granularity.
Return the reconstructed graph.

The reconstruction cost is O(graph size at T) + O(window deltas) — much cheaper than scanning per-slice snapshots over the retention horizon.

Generalisation beyond service topology¶

The pattern applies to any highly-correlated time-evolving graph where historical inspection is needed:

Social graphs — friendship edges change slowly relative to message events.
Permission graphs — IAM permissions evolve much more slowly than the requests that flow through them.
Knowledge graphs — facts in a knowledge graph evolve slowly relative to queries.
Asset-inventory graphs — cloud assets persist for hours/days in a typical fleet.

The shared property: the graph state at time T is more efficiently encoded as a baseline + N deltas than as a snapshot.

Sibling patterns¶

patterns/sliding-window-rollup-aggregation — sibling at the rollup-window level; per-time-bucket aggregation rather than per-graph-edge aggregation.
patterns/lazy-aggregate-from-monotonic-local-state — sibling at the read-side reconstruction level.
patterns/intermediate-result-snapshot-for-resume — sibling at the checkpoint-and-replay level (different intent: resume a computation rather than time-travel a graph).

Seen in¶

sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map — canonical wiki source. Time-travel topology query is named as a first-class capability of Service Topology, with the time- window-aggregation technique sketched at one paragraph's depth.

systems/netflix-service-topology — canonical instance
concepts/temporal-topology-query — concept-level framing
concepts/service-dependency-graph — the artifact whose history is being queried
companies/netflix