Skip to content

System Design — Overview

Synthesis across the wiki corpus. Snapshot (2026-06-18T03:00): 536 sources / 40 companies / 1,684 systems / 2,861 concepts / 1,771 patterns. No new ingests since 2026-06-13 synth (5 days). 9 skips on 2026-06-17 (all product announcements, marketing, or duplicate re-fetches). Wiki stable; corpus growth paused for Data + AI Summit noise.

What changed since 2026-06-13 synth

Quiet window. RSS poller fetched 9 new articles on 2026-06-17 but all were filtered out:

  • 1× Cloudflare product announcement (Cloudflare One agent toolkit — no architecture)
  • 5× Databricks Data + AI Summit marketing/product posts (dashboards, partner frameworks, security roundup, ML engineering agents, ecosystem pitch)
  • 1× Google Earth AI research (geospatial ML, no serving-infra)
  • 2× Netflix duplicate re-fetches (Human Infrastructure, State of Routing — both already ingested)

Last substantive batch (2026-06-11 → 2026-06-13) — 8 ingests across 6 companies:

  1. Airbnb: Scaling beyond one data architecture (T2). Data modeling framework for multi-product evolution: foundational principles (no hybrids, consistent naming, clear namespaces) + decentralized domain choice. New: 6 concepts (separate-vs-monolithic-data-models etc), 5 patterns. 14 pages touched.
  2. Atlassian: Architecting Scalable ML Platforms (T2). ML Studio: composable workflow modules, deterministic task caching, hot-cluster reuse, column-level access control. New: systems/atlassian-ml-studio, 7 concepts, 6 patterns. 16 pages touched.
  3. Databricks: AI Serving Platform That Adapts to Your Model (T3, architectural). AutoPilot Pod Autoscaler: two-axis horizontal+vertical autoscaling, warm node pools, model-aware concurrency tuning. 300K+ QPS with no customer-tuning knobs. New: 2 systems, 4 concepts, 3 patterns. 14 pages touched.
  4. Lyft: Metric Semantic Layer (T2). Metrics-as-code: YAML + Jinja → SQL generation, dual-owner governance, MCP integration for AI agents. Key insight: only "Golden Metrics" with ≥2 consumers qualify. New: systems/lyft-metric-semantic-layer, 4 concepts, 4 patterns. 14 pages touched.
  5. Cloudflare: Scaling Security Insights (T1). 10× scanning throughput via five architecture-only fixes (no infra adds): batch-parallel Kafka consumption, fast/slow consumer split, hybrid bulk INSERT, active-passive API collocation, adaptive rate-limited scheduling. New: systems/cloudflare-security-insights, 3 concepts, 3 patterns. 15 pages touched.
  6. Databricks: Zerobus Ingest — Petabyte-Scale (T3, deep). Zerobus architecture: stream-connection-level ordering (not partition-level), zero-copy protobuf at ~1 GB/s/core, WAL-before-lakehouse-publish. 12 GB/s sustained / 1 PB in 24h to single Delta table. New: 2 concepts, 4 patterns. 13 pages touched.
  7. Dropbox: MCP + Dash for design-to-code security (T2). Only 12% of PRs link back to threat models. MCP-as-context-bridge achieves 80% linkage via semantic search. Advisory-over-blocking principle. 11 pages touched.
  8. Databricks: Evolutionary DB Dev Part 3 (T3, conceptual). Team-scale database branching: tier topology as long-running branches, SCM state machine with blocking gates, agents-as-junior-developers, DBA→platform engineer. Neon reports ~500K branches/day, 80%+ agent-created. New: systems/lakebase-app-dev-kit, 3 concepts, 5 patterns. 16 pages touched.

Corpus shape

Metric Count
Source pages 536
Company pages 40
System pages 1,684
Concept pages 2,861
Pattern pages 1,771
Total wiki pages ~6,892

By company (top 20, source count)

Company Sources Tier
companies/cloudflare 58 T1
companies/databricks 47 T3
companies/redpanda 38 T3
companies/meta 31 T1
companies/aws 31 T1
companies/netflix 30 T1
companies/flyio 30 T3
companies/planetscale 28 T3
companies/figma 23 T2
companies/zalando 20 T2
companies/google 19 T1
companies/pinterest 15 T2
companies/mongodb 14 T3
companies/slack 13 T2
companies/instacart 13 T2
companies/yelp 12 T3
companies/vercel 12 T3
companies/dropbox 12 T2
companies/airbnb 12 T2
companies/github 11 T2
companies/datadog 10 T2

Publication timeline

~61% of sources published in 2026 (325), ~26% in 2025 (140), ~11% in 2024 (57). Remainder are canonical older posts (Figma multiplayer 2019, Zalando SRE 2021–2023). Ingestion rate: 144 sources in April 2026 (backfill), settling to ~12–15/month steady-state. June 2026 trending quiet: 23 sources in first 13 days, then pause due to summit noise filtering.

Under-sampled: Uber (HTML scraper pending), LinkedIn (stub), Apple / ByteDance / Microsoft Engineering.

Recurring architectural themes (by citation density)

  1. Blast radius containment — 18+ source refs, 283 inbound links. Cell architecture, staged rollouts, fault-domain isolation, incremental validation. Every Tier-1 company writes about it. Key: concepts/blast-radius, patterns/staged-rollout, patterns/incremental-blast-radius-validation.
  2. Control-plane / data-plane separation — 22 source refs, 160 inbound links. Extended by "control plane as the new data plane" under agentic workloads. Key: concepts/control-plane-data-plane-separation, systems/vitess, systems/lakebase.
  3. LLM-as-judge — 15 source refs, 200 inbound links. Dominant offline-eval pattern 2025–2026. Meta BVT, Zalando search quality, Instacart relevance, Cloudflare code review, Netflix synopses. Key: concepts/llm-as-judge.
  4. MCP / agent-native infrastructure — 31+ source tags, 289 inbound links. Fastest-growing theme. This window adds Dropbox MCP-for-security, Lyft MCP-for-metrics-agents. Key: systems/model-context-protocol, patterns/wrap-cli-as-mcp-server, patterns/specialized-agent-decomposition, patterns/mcp-as-context-bridge.

Tier B — structural (15–40 source refs)

  1. Change data capture — 17 source tags, 199 inbound links. Redpanda Connect, Debezium, Kafka Connect, Delta CDF, Oracle CDC. The plumbing connecting OLTP → analytics → lakehouse. Key: concepts/change-data-capture, systems/debezium, systems/redpanda-connect.
  2. Observability — 41 source tags, 192 inbound links. Shifting from push-to-TSDB to lakehouse-resident telemetry. OTel becoming universal. Key: concepts/observability, systems/opentelemetry, patterns/telemetry-to-lakehouse.
  3. Compute-storage separation — 16 source refs, 125 inbound links. The defining storage architecture: Lakebase, Snowflake, Neon, PlanetScale, Redpanda Cloud Topics. Key: concepts/compute-storage-separation.
  4. Horizontal sharding — appears in 162 sources (by content). PlanetScale/Vitess consensus series, DynamoDB, Netflix wide-partition splits. Key: concepts/horizontal-sharding, systems/vitess, concepts/wide-partition-problem.
  5. Schema evolution / database branching — 13 source refs (growing). Vitess online DDL, PlanetScale deploy requests, Lakebase three-part series, Iceberg schema evolution. Database branching moving from dev convenience → production substrate. Key: concepts/schema-evolution, concepts/database-branching, concepts/evolutionary-database-design.
  6. Autoscaling as system design — 12+ sources. Databricks two-axis autoscaler, Netflix container mount, Cloudflare adaptive rate-limited scheduling, Redpanda elastic partitioning. The theme: autoscaling is architectural, not operational. Key: concepts/cold-start, patterns/asymmetric-aggressive-up-conservative-down-autoscaling, patterns/two-axis-horizontal-plus-vertical-autoscaling.

Tier C — emergent / fast-growing (5–15 source refs)

  1. AI/LLM serving at scale — 26 LLM + 22 agents tags. Databricks 300K+ QPS inference (new this window), Slack multi-cloud routing, Netflix model-serving, Cloudflare Workers AI. Key: concepts/cold-start, concepts/context-engineering, patterns/multi-cloud-llm-serving, systems/databricks-model-serving.
  2. Durable execution — 8 source refs, 149 inbound links on Cloudflare Durable Objects alone. Implementations: embedded (Temporal), external (Step Functions), workflow-as-code (Cloudflare), DB-backed (Maestro). Key: concepts/durable-execution, systems/cloudflare-durable-objects.
  3. Post-quantum cryptography — 5 sources. Cloudflare (IPsec ML-KEM GA, TLS 1.3 PQ), Meta (migration framework), Google (quantum vulnerability disclosure). Key: concepts/post-quantum-cryptography.
  4. Generative retrieval — 3 sources (Instacart TIGER, Meta SilverTorch, Instacart ads). Autoregressive token generation replacing two-tower + ANN scoring. Key: concepts/generative-retrieval, systems/silvertorch.
  5. Defense-in-depth / zero-trust — 29 security tags. Cloudflare customer-zero, Yelp zero-trust access, GitHub eBPF deployment safety, Dropbox MCP-for-security (new). Key: concepts/defense-in-depth, concepts/positive-security-model.
  6. Metrics-as-code / semantic layer — 5 sources (new theme this window). Lyft MSL, Databricks BI Serving, Airbnb data architecture, Pinterest PiQaMa. Governance shifting left into versioned config. Key: concepts/headless-bi-semantic-layer, patterns/yaml-config-driven-metric-definitions.
System Links Primary source
systems/vitess 456 PlanetScale
systems/mysql 432 (ubiquitous)
systems/aws-s3 335 AWS
systems/model-context-protocol 289 Anthropic/ecosystem
systems/planetscale 274 PlanetScale
systems/kafka 265 (ubiquitous)
systems/redpanda 246 Redpanda
systems/postgresql 239 (ubiquitous)
systems/unity-catalog 235 Databricks
systems/cloudflare-workers 234 Cloudflare
systems/apache-iceberg 221 (ecosystem)
systems/kubernetes 211 (ubiquitous)
systems/lakebase 210 Databricks
systems/apache-spark 199 Databricks
systems/delta-lake 175 Databricks
systems/innodb 174 MySQL/Oracle
systems/dynamodb 164 AWS
systems/aws-lambda 161 AWS
systems/fly-machines 157 Fly.io
systems/cloudflare-durable-objects 149 Cloudflare

Most-cited concepts (top 20)

Concept Source refs Links
concepts/blast-radius 18 283
concepts/llm-as-judge 15 200
concepts/change-data-capture 14 199
concepts/observability 21 192
concepts/control-plane-data-plane-separation 22 160
concepts/horizontal-sharding 8 129
concepts/compute-storage-separation 16 125
concepts/defense-in-depth 10 28
concepts/scale-to-zero 11 28
concepts/tail-latency-at-scale 8 25
concepts/cold-start 11 22
concepts/durable-execution 8 20
concepts/context-engineering 12 19
concepts/post-quantum-cryptography 5 18
concepts/vector-similarity-search 11 17
concepts/database-branching 6 17
concepts/schema-evolution 10 16
concepts/tenant-isolation 9 16
concepts/medallion-architecture 7 16
concepts/evolutionary-database-design 4 15

Most-cited patterns (top 15)

Pattern Links Description
patterns/upstream-the-fix 46 Fix at source, not downstream
patterns/specialized-agent-decomposition 27 Decompose agent into specialist sub-agents
patterns/wrap-cli-as-mcp-server 22 Expose CLI tools via MCP
patterns/disposable-vm-for-agentic-loop 17 Sandbox agent in ephemeral VMs
patterns/ai-gateway-provider-abstraction 17 Unified gateway across LLM providers
patterns/tool-surface-minimization 17 Fewer tools = more reliable agents
patterns/staged-rollout 15 Progressive deploy with rollback gates
patterns/measurement-driven-micro-optimization 13 Profile → optimize → measure loop
patterns/cheap-approximator-with-expensive-fallback 13 Fast heuristic + slow precise backup
patterns/ltx-compaction 13 LTX file compaction strategy
patterns/partner-managed-service-as-native-binding 13 Third-party as first-class binding
patterns/streaming-broker-as-lakehouse-bronze-sink 11 Stream → lakehouse bronze layer
patterns/fast-rollback 11 Instant revert in deployments
patterns/dynamic-partition-split-async-pipeline 10 Netflix wide-partition auto-split
patterns/mcp-as-context-bridge 10 MCP for cross-system context retrieval

Trade-offs and contradictions

Documented tensions across sources

  • Monolith ↔ microservice. Meta SilverTorch collapses retrieval microservices → unified PyTorch model. Airbnb expands from monolithic vendor → distributed graph. Rule: compute-bound consolidates; IO-bound + multi-team distributes.
  • Generative retrieval vs two-tower + ANN. Instacart chose generative for large catalogs + cold-start. Meta/Pinterest retain two-tower for real-time personalization. Split on: latency tolerance × catalog size × cold-start severity.
  • Partition-level vs stream-level ordering. Kafka: partition = ordering unit (scaling requires rebalance). Zerobus: stream-connection = ordering unit (dynamic scaling). Trade-off: Kafka's design is simpler but inflexible; Zerobus enables elastic autoscaling but requires custom client semantics.
  • Database branching: copy-on-write vs schema-only. Lakebase uses CoW forks (instant, data-included). PlanetScale deploy requests branch schema only. Neon CoW with ephemeral compute. Different DX goals (testing vs migration safety vs full-environment parity).
  • Agent governance: centralized vs per-tool. Unity Catalog (centralized ACL) vs Cloudflare (per-tool Zod schema + egress policy) vs Lakebase (SCM state machine with blocking gates). Spectrum, not binary — all converging on "agents-as-junior-developers" with structural constraints.
  • Separate vs monolithic data models. Airbnb's explicit framework: product teams with unique attributes → separate; cross-cutting services → monolithic. Neither universally superior.
  • Signature-matching vs ML-scoring for security. Cloudflare: ML anomaly scoring over WAF signatures for frontier-model threats. Dropbox: LLM-based semantic gap detection between design and code. Both moving from pattern-match to reasoning-based security.
  • Active-active vs active-passive API. Cloudflare discovered active-active API with single-region DB primary causes cross-region latency → pool exhaustion. Active-passive (collocate API with primary) wins for write-heavy workloads.

High confidence (5+ sources, clear trajectory)

  • MCP becoming de-facto agent-tool protocol. 31+ sources. Cloudflare, MongoDB, Redpanda, Databricks, Pinterest, Fly.io, Dropbox, Lyft all publishing MCP integrations. Use cases expanding: from CLI tooling → security context → metrics governance.
  • Observability data → lakehouse. Databricks OTel + Unity Catalog, Airbnb statsd→OTel, Yelp S3 access logs, Pinterest PerfView. Cost-driven: TSDB retention expensive; Iceberg/Delta cheap at petabyte scale.
  • AI code review / AI-in-CI standard. Cloudflare (coordinator + sub-reviewers), Meta (BVT), Zalando (AI-as-judge quality gates), Atlassian Rovo Dev, Dropbox MCP security review.
  • Security shifting to architecture-over-patching. Cloudflare customer-zero, positive-security-model, continuous red-team. Dropbox: advisory LLM-based gap detection. Assumption: AI-assisted attackers make signature-based defense insufficient.
  • Database branching as substrate, not feature. Lakebase 3-part series, Neon (~500K branches/day), PlanetScale deploy requests. Databases becoming "branchable by default" for agents and CI. Key: concepts/evolutionary-database-design, patterns/per-developer-database-branch-paired-with-code-branch.

Medium confidence (3–4 sources, emerging)

  • Generative retrieval → production. Instacart (TIGER + ads) + Meta (SilverTorch). Both shift from "score candidates" to "generate candidate IDs."
  • Bare-metal fleet ops as system design. Cloudflare boot-time optimization (hours→minutes) + Meta PowerLoss Storm + Redpanda Cloud Topics metastore. Firmware/UEFI/iPXE is load-bearing.
  • Contract-driven multi-agent coordination. OmniNode topic-naming, Atlassian Jira triggers, MongoDB MCP registry, Lakebase artifact-as-API. Inter-agent channel naming/schema as dominant failure mode.
  • Metrics-as-code with dual governance. Lyft MSL, Pinterest PiQaMa, Databricks BI Serving. Pattern: YAML config + template SQL + dual-owner approval + MCP exposure.
  • Zero-copy high-throughput ingestion. Zerobus (~1 GB/s/core via Rust zero-copy protobuf), Redpanda (zero-copy Kafka), Cloudflare scanning (batch-parallel Kafka). Memory allocation is the enemy of throughput.

Low confidence (2 sources, watch)

  • Region-scale chaos engineering. Only Meta (PowerLoss Storm). Netflix operates at AZ level. Requires enormous maturity.
  • Post-quantum at scale. Only Cloudflare + Meta have published migration frameworks. Most companies haven't started.
  • Raft log ≡ LSM WAL unification. Redpanda Cloud Topics metastore. Elegant but only one implementation so far.
  • Stream-connection-level ordering replacing partition-level. Only Zerobus so far. Kafka's partition model still dominates.

Language / runtime observations

From source tags: Rust (15+ sources — Cloudflare Pingora, Meta WhatsApp, Aurora DSQL, Figma memory, Zerobus zero-copy), TypeScript (13 — Vercel, Cloudflare Workers, Zalando), Go (10 — Fly.io, Instacart serving, Datadog agent, Cloudflare scanning), Kotlin/Java (5 — Meta Kotlinator, Netflix JDK Vector API, Slack), Python (Lyft MSL, Atlassian ML Studio). Rust adoption concentrated at hot-path / memory-safety boundary. Go dominates network-intensive microservices. TypeScript dominates edge/serverless.

Open questions

  • Zerobus ordering guarantees under failure. Stream-connection-level ordering during failover/rebalance — docs mention graceful drain but not crash recovery semantics.
  • Lakebase branching production adoption. Neon discloses 500K branches/day; Lakebase production-scale numbers remain undisclosed.
  • Lyft MSL scale metrics. How many Golden Metrics? How many consumers? Cost of Python package approach vs service approach as org scales?
  • Cloudflare Security Insights cross-DC replication. Active-passive solved the problem; what happens when primary fails over?
  • Dropbox MCP security coverage. 80% linkage via semantic search — what's the false positive rate on gap detection?
  • Netflix dynamic partition split cross-DC. Whether split metadata replicates cross-DC or is region-local.
  • Meta PowerLoss Storm frequency. How often region-scale tests run undisclosed.
Last updated · 542 distilled / 1,571 read