System Design — Overview¶
Synthesis across the wiki corpus. Snapshot (2026-06-18T03:00): 536 sources / 40 companies / 1,684 systems / 2,861 concepts / 1,771 patterns. No new ingests since 2026-06-13 synth (5 days). 9 skips on 2026-06-17 (all product announcements, marketing, or duplicate re-fetches). Wiki stable; corpus growth paused for Data + AI Summit noise.
What changed since 2026-06-13 synth¶
Quiet window. RSS poller fetched 9 new articles on 2026-06-17 but all were filtered out:
- 1× Cloudflare product announcement (Cloudflare One agent toolkit — no architecture)
- 5× Databricks Data + AI Summit marketing/product posts (dashboards, partner frameworks, security roundup, ML engineering agents, ecosystem pitch)
- 1× Google Earth AI research (geospatial ML, no serving-infra)
- 2× Netflix duplicate re-fetches (Human Infrastructure, State of Routing — both already ingested)
Last substantive batch (2026-06-11 → 2026-06-13) — 8 ingests across 6 companies:
- Airbnb: Scaling beyond one data architecture (T2). Data modeling framework for multi-product evolution: foundational principles (no hybrids, consistent naming, clear namespaces) + decentralized domain choice. New: 6 concepts (separate-vs-monolithic-data-models etc), 5 patterns. 14 pages touched.
- Atlassian: Architecting Scalable ML Platforms (T2). ML Studio: composable workflow modules, deterministic task caching, hot-cluster reuse, column-level access control. New: systems/atlassian-ml-studio, 7 concepts, 6 patterns. 16 pages touched.
- Databricks: AI Serving Platform That Adapts to Your Model (T3, architectural). AutoPilot Pod Autoscaler: two-axis horizontal+vertical autoscaling, warm node pools, model-aware concurrency tuning. 300K+ QPS with no customer-tuning knobs. New: 2 systems, 4 concepts, 3 patterns. 14 pages touched.
- Lyft: Metric Semantic Layer (T2). Metrics-as-code: YAML + Jinja → SQL generation, dual-owner governance, MCP integration for AI agents. Key insight: only "Golden Metrics" with ≥2 consumers qualify. New: systems/lyft-metric-semantic-layer, 4 concepts, 4 patterns. 14 pages touched.
- Cloudflare: Scaling Security Insights (T1). 10× scanning throughput via five architecture-only fixes (no infra adds): batch-parallel Kafka consumption, fast/slow consumer split, hybrid bulk INSERT, active-passive API collocation, adaptive rate-limited scheduling. New: systems/cloudflare-security-insights, 3 concepts, 3 patterns. 15 pages touched.
- Databricks: Zerobus Ingest — Petabyte-Scale (T3, deep). Zerobus architecture: stream-connection-level ordering (not partition-level), zero-copy protobuf at ~1 GB/s/core, WAL-before-lakehouse-publish. 12 GB/s sustained / 1 PB in 24h to single Delta table. New: 2 concepts, 4 patterns. 13 pages touched.
- Dropbox: MCP + Dash for design-to-code security (T2). Only 12% of PRs link back to threat models. MCP-as-context-bridge achieves 80% linkage via semantic search. Advisory-over-blocking principle. 11 pages touched.
- Databricks: Evolutionary DB Dev Part 3 (T3, conceptual). Team-scale database branching: tier topology as long-running branches, SCM state machine with blocking gates, agents-as-junior-developers, DBA→platform engineer. Neon reports ~500K branches/day, 80%+ agent-created. New: systems/lakebase-app-dev-kit, 3 concepts, 5 patterns. 16 pages touched.
Corpus shape¶
| Metric | Count |
|---|---|
| Source pages | 536 |
| Company pages | 40 |
| System pages | 1,684 |
| Concept pages | 2,861 |
| Pattern pages | 1,771 |
| Total wiki pages | ~6,892 |
By company (top 20, source count)¶
| Company | Sources | Tier |
|---|---|---|
| companies/cloudflare | 58 | T1 |
| companies/databricks | 47 | T3 |
| companies/redpanda | 38 | T3 |
| companies/meta | 31 | T1 |
| companies/aws | 31 | T1 |
| companies/netflix | 30 | T1 |
| companies/flyio | 30 | T3 |
| companies/planetscale | 28 | T3 |
| companies/figma | 23 | T2 |
| companies/zalando | 20 | T2 |
| companies/google | 19 | T1 |
| companies/pinterest | 15 | T2 |
| companies/mongodb | 14 | T3 |
| companies/slack | 13 | T2 |
| companies/instacart | 13 | T2 |
| companies/yelp | 12 | T3 |
| companies/vercel | 12 | T3 |
| companies/dropbox | 12 | T2 |
| companies/airbnb | 12 | T2 |
| companies/github | 11 | T2 |
| companies/datadog | 10 | T2 |
Publication timeline¶
~61% of sources published in 2026 (325), ~26% in 2025 (140), ~11% in 2024 (57). Remainder are canonical older posts (Figma multiplayer 2019, Zalando SRE 2021–2023). Ingestion rate: 144 sources in April 2026 (backfill), settling to ~12–15/month steady-state. June 2026 trending quiet: 23 sources in first 13 days, then pause due to summit noise filtering.
Under-sampled: Uber (HTML scraper pending), LinkedIn (stub), Apple / ByteDance / Microsoft Engineering.
Recurring architectural themes (by citation density)¶
Tier A — pervasive (40+ source references or 200+ inbound links)¶
- Blast radius containment — 18+ source refs, 283 inbound links. Cell architecture, staged rollouts, fault-domain isolation, incremental validation. Every Tier-1 company writes about it. Key: concepts/blast-radius, patterns/staged-rollout, patterns/incremental-blast-radius-validation.
- Control-plane / data-plane separation — 22 source refs, 160 inbound links. Extended by "control plane as the new data plane" under agentic workloads. Key: concepts/control-plane-data-plane-separation, systems/vitess, systems/lakebase.
- LLM-as-judge — 15 source refs, 200 inbound links. Dominant offline-eval pattern 2025–2026. Meta BVT, Zalando search quality, Instacart relevance, Cloudflare code review, Netflix synopses. Key: concepts/llm-as-judge.
- MCP / agent-native infrastructure — 31+ source tags, 289 inbound links. Fastest-growing theme. This window adds Dropbox MCP-for-security, Lyft MCP-for-metrics-agents. Key: systems/model-context-protocol, patterns/wrap-cli-as-mcp-server, patterns/specialized-agent-decomposition, patterns/mcp-as-context-bridge.
Tier B — structural (15–40 source refs)¶
- Change data capture — 17 source tags, 199 inbound links. Redpanda Connect, Debezium, Kafka Connect, Delta CDF, Oracle CDC. The plumbing connecting OLTP → analytics → lakehouse. Key: concepts/change-data-capture, systems/debezium, systems/redpanda-connect.
- Observability — 41 source tags, 192 inbound links. Shifting from push-to-TSDB to lakehouse-resident telemetry. OTel becoming universal. Key: concepts/observability, systems/opentelemetry, patterns/telemetry-to-lakehouse.
- Compute-storage separation — 16 source refs, 125 inbound links. The defining storage architecture: Lakebase, Snowflake, Neon, PlanetScale, Redpanda Cloud Topics. Key: concepts/compute-storage-separation.
- Horizontal sharding — appears in 162 sources (by content). PlanetScale/Vitess consensus series, DynamoDB, Netflix wide-partition splits. Key: concepts/horizontal-sharding, systems/vitess, concepts/wide-partition-problem.
- Schema evolution / database branching — 13 source refs (growing). Vitess online DDL, PlanetScale deploy requests, Lakebase three-part series, Iceberg schema evolution. Database branching moving from dev convenience → production substrate. Key: concepts/schema-evolution, concepts/database-branching, concepts/evolutionary-database-design.
- Autoscaling as system design — 12+ sources. Databricks two-axis autoscaler, Netflix container mount, Cloudflare adaptive rate-limited scheduling, Redpanda elastic partitioning. The theme: autoscaling is architectural, not operational. Key: concepts/cold-start, patterns/asymmetric-aggressive-up-conservative-down-autoscaling, patterns/two-axis-horizontal-plus-vertical-autoscaling.
Tier C — emergent / fast-growing (5–15 source refs)¶
- AI/LLM serving at scale — 26 LLM + 22 agents tags. Databricks 300K+ QPS inference (new this window), Slack multi-cloud routing, Netflix model-serving, Cloudflare Workers AI. Key: concepts/cold-start, concepts/context-engineering, patterns/multi-cloud-llm-serving, systems/databricks-model-serving.
- Durable execution — 8 source refs, 149 inbound links on Cloudflare Durable Objects alone. Implementations: embedded (Temporal), external (Step Functions), workflow-as-code (Cloudflare), DB-backed (Maestro). Key: concepts/durable-execution, systems/cloudflare-durable-objects.
- Post-quantum cryptography — 5 sources. Cloudflare (IPsec ML-KEM GA, TLS 1.3 PQ), Meta (migration framework), Google (quantum vulnerability disclosure). Key: concepts/post-quantum-cryptography.
- Generative retrieval — 3 sources (Instacart TIGER, Meta SilverTorch, Instacart ads). Autoregressive token generation replacing two-tower + ANN scoring. Key: concepts/generative-retrieval, systems/silvertorch.
- Defense-in-depth / zero-trust — 29 security tags. Cloudflare customer-zero, Yelp zero-trust access, GitHub eBPF deployment safety, Dropbox MCP-for-security (new). Key: concepts/defense-in-depth, concepts/positive-security-model.
- Metrics-as-code / semantic layer — 5 sources (new theme this window). Lyft MSL, Databricks BI Serving, Airbnb data architecture, Pinterest PiQaMa. Governance shifting left into versioned config. Key: concepts/headless-bi-semantic-layer, patterns/yaml-config-driven-metric-definitions.
Most-cited systems (by inbound wiki-links, top 20)¶
| System | Links | Primary source |
|---|---|---|
| systems/vitess | 456 | PlanetScale |
| systems/mysql | 432 | (ubiquitous) |
| systems/aws-s3 | 335 | AWS |
| systems/model-context-protocol | 289 | Anthropic/ecosystem |
| systems/planetscale | 274 | PlanetScale |
| systems/kafka | 265 | (ubiquitous) |
| systems/redpanda | 246 | Redpanda |
| systems/postgresql | 239 | (ubiquitous) |
| systems/unity-catalog | 235 | Databricks |
| systems/cloudflare-workers | 234 | Cloudflare |
| systems/apache-iceberg | 221 | (ecosystem) |
| systems/kubernetes | 211 | (ubiquitous) |
| systems/lakebase | 210 | Databricks |
| systems/apache-spark | 199 | Databricks |
| systems/delta-lake | 175 | Databricks |
| systems/innodb | 174 | MySQL/Oracle |
| systems/dynamodb | 164 | AWS |
| systems/aws-lambda | 161 | AWS |
| systems/fly-machines | 157 | Fly.io |
| systems/cloudflare-durable-objects | 149 | Cloudflare |
Most-cited concepts (top 20)¶
| Concept | Source refs | Links |
|---|---|---|
| concepts/blast-radius | 18 | 283 |
| concepts/llm-as-judge | 15 | 200 |
| concepts/change-data-capture | 14 | 199 |
| concepts/observability | 21 | 192 |
| concepts/control-plane-data-plane-separation | 22 | 160 |
| concepts/horizontal-sharding | 8 | 129 |
| concepts/compute-storage-separation | 16 | 125 |
| concepts/defense-in-depth | 10 | 28 |
| concepts/scale-to-zero | 11 | 28 |
| concepts/tail-latency-at-scale | 8 | 25 |
| concepts/cold-start | 11 | 22 |
| concepts/durable-execution | 8 | 20 |
| concepts/context-engineering | 12 | 19 |
| concepts/post-quantum-cryptography | 5 | 18 |
| concepts/vector-similarity-search | 11 | 17 |
| concepts/database-branching | 6 | 17 |
| concepts/schema-evolution | 10 | 16 |
| concepts/tenant-isolation | 9 | 16 |
| concepts/medallion-architecture | 7 | 16 |
| concepts/evolutionary-database-design | 4 | 15 |
Most-cited patterns (top 15)¶
| Pattern | Links | Description |
|---|---|---|
| patterns/upstream-the-fix | 46 | Fix at source, not downstream |
| patterns/specialized-agent-decomposition | 27 | Decompose agent into specialist sub-agents |
| patterns/wrap-cli-as-mcp-server | 22 | Expose CLI tools via MCP |
| patterns/disposable-vm-for-agentic-loop | 17 | Sandbox agent in ephemeral VMs |
| patterns/ai-gateway-provider-abstraction | 17 | Unified gateway across LLM providers |
| patterns/tool-surface-minimization | 17 | Fewer tools = more reliable agents |
| patterns/staged-rollout | 15 | Progressive deploy with rollback gates |
| patterns/measurement-driven-micro-optimization | 13 | Profile → optimize → measure loop |
| patterns/cheap-approximator-with-expensive-fallback | 13 | Fast heuristic + slow precise backup |
| patterns/ltx-compaction | 13 | LTX file compaction strategy |
| patterns/partner-managed-service-as-native-binding | 13 | Third-party as first-class binding |
| patterns/streaming-broker-as-lakehouse-bronze-sink | 11 | Stream → lakehouse bronze layer |
| patterns/fast-rollback | 11 | Instant revert in deployments |
| patterns/dynamic-partition-split-async-pipeline | 10 | Netflix wide-partition auto-split |
| patterns/mcp-as-context-bridge | 10 | MCP for cross-system context retrieval |
Trade-offs and contradictions¶
Documented tensions across sources¶
- Monolith ↔ microservice. Meta SilverTorch collapses retrieval microservices → unified PyTorch model. Airbnb expands from monolithic vendor → distributed graph. Rule: compute-bound consolidates; IO-bound + multi-team distributes.
- Generative retrieval vs two-tower + ANN. Instacart chose generative for large catalogs + cold-start. Meta/Pinterest retain two-tower for real-time personalization. Split on: latency tolerance × catalog size × cold-start severity.
- Partition-level vs stream-level ordering. Kafka: partition = ordering unit (scaling requires rebalance). Zerobus: stream-connection = ordering unit (dynamic scaling). Trade-off: Kafka's design is simpler but inflexible; Zerobus enables elastic autoscaling but requires custom client semantics.
- Database branching: copy-on-write vs schema-only. Lakebase uses CoW forks (instant, data-included). PlanetScale deploy requests branch schema only. Neon CoW with ephemeral compute. Different DX goals (testing vs migration safety vs full-environment parity).
- Agent governance: centralized vs per-tool. Unity Catalog (centralized ACL) vs Cloudflare (per-tool Zod schema + egress policy) vs Lakebase (SCM state machine with blocking gates). Spectrum, not binary — all converging on "agents-as-junior-developers" with structural constraints.
- Separate vs monolithic data models. Airbnb's explicit framework: product teams with unique attributes → separate; cross-cutting services → monolithic. Neither universally superior.
- Signature-matching vs ML-scoring for security. Cloudflare: ML anomaly scoring over WAF signatures for frontier-model threats. Dropbox: LLM-based semantic gap detection between design and code. Both moving from pattern-match to reasoning-based security.
- Active-active vs active-passive API. Cloudflare discovered active-active API with single-region DB primary causes cross-region latency → pool exhaustion. Active-passive (collocate API with primary) wins for write-heavy workloads.
Trends (confidence-weighted)¶
High confidence (5+ sources, clear trajectory)¶
- MCP becoming de-facto agent-tool protocol. 31+ sources. Cloudflare, MongoDB, Redpanda, Databricks, Pinterest, Fly.io, Dropbox, Lyft all publishing MCP integrations. Use cases expanding: from CLI tooling → security context → metrics governance.
- Observability data → lakehouse. Databricks OTel + Unity Catalog, Airbnb statsd→OTel, Yelp S3 access logs, Pinterest PerfView. Cost-driven: TSDB retention expensive; Iceberg/Delta cheap at petabyte scale.
- AI code review / AI-in-CI standard. Cloudflare (coordinator + sub-reviewers), Meta (BVT), Zalando (AI-as-judge quality gates), Atlassian Rovo Dev, Dropbox MCP security review.
- Security shifting to architecture-over-patching. Cloudflare customer-zero, positive-security-model, continuous red-team. Dropbox: advisory LLM-based gap detection. Assumption: AI-assisted attackers make signature-based defense insufficient.
- Database branching as substrate, not feature. Lakebase 3-part series, Neon (~500K branches/day), PlanetScale deploy requests. Databases becoming "branchable by default" for agents and CI. Key: concepts/evolutionary-database-design, patterns/per-developer-database-branch-paired-with-code-branch.
Medium confidence (3–4 sources, emerging)¶
- Generative retrieval → production. Instacart (TIGER + ads) + Meta (SilverTorch). Both shift from "score candidates" to "generate candidate IDs."
- Bare-metal fleet ops as system design. Cloudflare boot-time optimization (hours→minutes) + Meta PowerLoss Storm + Redpanda Cloud Topics metastore. Firmware/UEFI/iPXE is load-bearing.
- Contract-driven multi-agent coordination. OmniNode topic-naming, Atlassian Jira triggers, MongoDB MCP registry, Lakebase artifact-as-API. Inter-agent channel naming/schema as dominant failure mode.
- Metrics-as-code with dual governance. Lyft MSL, Pinterest PiQaMa, Databricks BI Serving. Pattern: YAML config + template SQL + dual-owner approval + MCP exposure.
- Zero-copy high-throughput ingestion. Zerobus (~1 GB/s/core via Rust zero-copy protobuf), Redpanda (zero-copy Kafka), Cloudflare scanning (batch-parallel Kafka). Memory allocation is the enemy of throughput.
Low confidence (2 sources, watch)¶
- Region-scale chaos engineering. Only Meta (PowerLoss Storm). Netflix operates at AZ level. Requires enormous maturity.
- Post-quantum at scale. Only Cloudflare + Meta have published migration frameworks. Most companies haven't started.
- Raft log ≡ LSM WAL unification. Redpanda Cloud Topics metastore. Elegant but only one implementation so far.
- Stream-connection-level ordering replacing partition-level. Only Zerobus so far. Kafka's partition model still dominates.
Language / runtime observations¶
From source tags: Rust (15+ sources — Cloudflare Pingora, Meta WhatsApp, Aurora DSQL, Figma memory, Zerobus zero-copy), TypeScript (13 — Vercel, Cloudflare Workers, Zalando), Go (10 — Fly.io, Instacart serving, Datadog agent, Cloudflare scanning), Kotlin/Java (5 — Meta Kotlinator, Netflix JDK Vector API, Slack), Python (Lyft MSL, Atlassian ML Studio). Rust adoption concentrated at hot-path / memory-safety boundary. Go dominates network-intensive microservices. TypeScript dominates edge/serverless.
Open questions¶
- Zerobus ordering guarantees under failure. Stream-connection-level ordering during failover/rebalance — docs mention graceful drain but not crash recovery semantics.
- Lakebase branching production adoption. Neon discloses 500K branches/day; Lakebase production-scale numbers remain undisclosed.
- Lyft MSL scale metrics. How many Golden Metrics? How many consumers? Cost of Python package approach vs service approach as org scales?
- Cloudflare Security Insights cross-DC replication. Active-passive solved the problem; what happens when primary fails over?
- Dropbox MCP security coverage. 80% linkage via semantic search — what's the false positive rate on gap detection?
- Netflix dynamic partition split cross-DC. Whether split metadata replicates cross-DC or is region-local.
- Meta PowerLoss Storm frequency. How often region-scale tests run undisclosed.
Navigation hints¶
- Fundamentals → concepts/horizontal-sharding, concepts/compute-storage-separation, concepts/change-data-capture, concepts/eventual-consistency, concepts/blast-radius.
- Storage deep-dive → systems/vitess, systems/lakebase, systems/apache-iceberg, systems/liquid-clustering, concepts/wide-partition-problem, patterns/expand-and-contract-schema-migration.
- LLM serving → concepts/cold-start, concepts/scale-to-zero, patterns/multi-cloud-llm-serving, systems/databricks-model-serving, patterns/two-axis-horizontal-plus-vertical-autoscaling.
- Agent infra → systems/model-context-protocol, systems/cloudflare-agents-sdk, patterns/specialized-agent-decomposition, patterns/wrap-cli-as-mcp-server, patterns/mcp-as-context-bridge, concepts/context-engineering.
- Data architecture → concepts/separate-vs-monolithic-data-models, concepts/headless-bi-semantic-layer, patterns/domain-driven-data-modeling-choice, concepts/data-lakehouse.
- Observability → concepts/observability, systems/opentelemetry, systems/prometheus, patterns/telemetry-to-lakehouse, concepts/observability-stack-partial-dependency.
- Security / crypto → concepts/defense-in-depth, concepts/post-quantum-cryptography, concepts/positive-security-model, patterns/require-access-before-reachability, patterns/mcp-as-context-bridge.
- Streaming / CDC → systems/redpanda, systems/kafka, concepts/change-data-capture, systems/zerobus-ingest, systems/redpanda-cloud-topics, patterns/streaming-broker-as-lakehouse-bronze-sink.
- Retrieval / RecSys → concepts/generative-retrieval, concepts/two-tower-architecture, systems/silvertorch, systems/instacart-generative-ads-retrieval, concepts/semantic-id.
- Graph at scale → concepts/knowledge-graph, systems/meta-tao, systems/janusgraph, systems/netflix-graph-abstraction, concepts/identity-graph.
- Reliability → concepts/chaos-engineering, concepts/instantaneous-power-loss, patterns/staged-rollout, patterns/fast-rollback, concepts/bootstrapping-circular-dependency.
- Database ops → concepts/database-branching, concepts/evolutionary-database-design, systems/lakebase, systems/planetscale, patterns/per-developer-database-branch-paired-with-code-branch, patterns/scm-workflow-state-machine.
- Autoscaling → patterns/asymmetric-aggressive-up-conservative-down-autoscaling, patterns/two-axis-horizontal-plus-vertical-autoscaling, concepts/cold-start, systems/databricks-autopilot-pod-autoscaler.
- Audit →
wiki/index.md,wiki/log.md,wiki/analyses/.