Skip to content

CONCEPT Cited by 1 source

Multi-source topology fusion

Multi-source topology fusion is the architectural choice to build a service dependency graph from multiple independent capture substrates, each producing a complete graph of its own, and to merge them only at query time rather than at write time. The wiki's first canonical instance is Netflix's Service Topology, which uses three substrates (eBPF flows, IPC metrics, end-to-end traces) and three physically separate graphs.

The structural insight

"No single source tells the complete story." (sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map)

Every capture substrate has a structural blind spot:

Substrate What it sees What it can't see
Network-level (eBPF / packet capture) Every flow, including for uninstrumented services Application semantics — endpoint, path, protocol verb
Application-level (IPC / RPC metrics) Endpoints, error rates, latency, request/response detail Anything from uninstrumented services
Request-level (distributed tracing) Actual runtime call paths per request Sampled — rare or low-frequency paths often missed

The blind-spot list is not coincidental — each substrate's capture point is structurally different (kernel vs application runtime vs request-context propagation), and the blind spot follows from the capture point.

Why fusion at query time, not write time

Netflix's design names two reasons each layer is a physically separate graph instead of a write-time-fused single graph:

"Each source creates its own graph that is physically separate — the network layer in one graph database partition, the IPC layer in another partition, and the tracing layer using columnar storage optimized for analytical queries. This physical separation allows each layer to evolve independently and be queried in parallel."

Two payoffs:

  • Independent evolution. Substrates have different update cadences, different ownership boundaries, different schema shapes. A change to IPC instrumentation rolling out doesn't touch the eBPF graph; tracing-sampling changes don't touch the application graph.
  • Substrate fit per query shape. Graph DB for path traversal on the network and IPC layers; columnar storage for trace-shape analytical queries. A write-time fusion would force a uniform storage choice that's wrong for at least one of the layers.

Query-time fusion is implemented as parallel traversal across all layers + result merge — see patterns/three-layer-graph-merge-on-query:

"When users request a unified view, we execute traversal queries across all layers simultaneously and merge results, achieving sub-second response times even when combining all three layers."

How the layers compensate for each other

The Netflix post names the complementarity explicitly:

"Network flows ensure completeness — we don't miss anything. IPC metrics provide application details — we understand the 'how' and 'what'. Tracing shows actual behavior — we see real request patterns. Each source compensates for the limitations of the others."

Concrete examples of compensation:

  • A service emitting no IPC metrics still appears in the network graph from eBPF capture — the "why is this service showing as 'Unknown'" support ticket is answered.
  • A network-only edge (App A → App B at the IP layer) gets enriched with endpoint detail when the IPC layer also has the same edge — the unified view shows both layers' data on the same edge.
  • A rarely-used code path that tracing missed is still visible in the network and IPC layers; tracing then provides the runtime behaviour on the more-common paths.

The orthogonal view: each layer is queryable on its own

A consequence of physical separation: engineers can toggle layer visibility based on what they're investigating:

  • "Pure network connectivity" (eBPF only) — for ground-truth questions about what's actually talking to what at the IP layer.
  • "Application-level calls" (IPC only) — for endpoint / protocol-level investigation.
  • "Traced request flows" (tracing only) — for understanding specific request behaviour.
  • "Unified" — for the whole picture.

Single-source-of-truth designs lose this — there's no way to look at "just the network layer" once you've fused everything into one graph.

Generalisation: the structural recipe

The pattern recurs across observability and distributed-systems domains where any single capture mechanism leaves blind spots. Sibling instances on the wiki:

The shared shape: complementary substrates → independent storage → query-time merge.

Seen in

Last updated · 542 distilled / 1,571 read