Airbnb¶
Airbnb Engineering blog. Tier-2 source on the sysdesign-wiki. Historically strong on marketplace infra, Kubernetes tooling, data platform, and service mesh; recent posts cover dynamic configuration, developer platform, incident tooling, and (2026-05) the OSS 1.0 release of the Viaduct GraphQL multi-tenant runtime.
Key systems¶
- systems/airbnb-knowledge-graph-infrastructure — internally managed, multi-tenant knowledge graph platform built on JanusGraph + DynamoDB + OpenSearch; namespace-isolated tenants include identity graph (7B nodes, 11B edges), inventory knowledge graph, fraud detection, and data lineage; management service handles schema enforcement + index lifecycle + Thrift API generation; internal JanusGraph fork with custom transaction strategy (DynamoDB conditional writes), parallel multi-slice fetching, and distributed tracing
- systems/viaduct — Airbnb's GraphQL-based "data-oriented
service mesh": a multi-tenant runtime that hosts independently
developed and tested tenant modules, each owning a portion
of the schema. Used internally for years; OSS 1.0 released
2026-05-13 to Maven Central with full
@StableApi/@ExperimentalApi/@InternalApidiscipline + Kotlin binary compatibility validator in CI + Dokka docs + open RFC community process. The third topology on the wiki for decentralized development of a central GraphQL schema, positioned as complementary to Apollo Federation (Viaduct instances can participate as Federation subgraphs, collapsing per-team server cost while preserving cross-org composition). Load-bearing framing: "Federation distributes development by distributing servers. Viaduct distributes development by distributing modules." - systems/airbnb-skipper — embedded Java/Kotlin workflow
engine providing durable
execution as a library dependency rather than an external
orchestration cluster; uses the host service's existing
database (MySQL /
UDS /
DynamoDB) for workflow state, replays
the workflow method on crash while short-circuiting
previously completed actions via stored results; 5-annotation
programming model (
@WorkflowMethod,@StateField,@SignalMethod,@Execute(checkpoint = true),@Compensate); in production >1 year across 15+ teams (insurance, payments, media, infrastructure, incentives, wallet) with peak 10 000 workflows / second on DynamoDB - systems/airbnb-uds — Airbnb's internal Unified Data Store (stub); named as one of Skipper's pluggable persistence backends alongside MySQL
- systems/sitar — internal dynamic configuration platform (control plane + data plane + sidecar agent + GitHub-based config workflow)
- systems/airbnb-observability-platform — in-house Prometheus/PromQL metrics platform (1,000 services, 300M timeseries, 3,100 dashboards, 300K+ alerts) replacing a vendor stack after a ~5-year migration; OTLP collection + vmagent streaming aggregation at 100M+ samples/sec; reliability plane (2026-05-05): dedicated-but-managed K8s clusters + custom Envoy L7 ingress tier (independent of Istio) + meta-monitoring HA Prometheus–Alertmanager pairs terminated by a dead-man's switch on AWS SNS + CloudWatch
- systems/airbnb-metrics-storage — the storage plane under the observability platform: multi-cluster, multi-tenant time-series storage fleet at 50M samples/sec / 1.3B active series / 2.5 PB logical data; tenant-per-application with shuffle sharding on read + write paths; three-zone stateful deploys; multi-cluster federation via custom Promxy with native-histogram support + query-fanout optimization; progressive cluster rollout (test → internal → app → infra) for >99.9% availability
- systems/vmagent — VictoriaMetrics agent used as Airbnb's sharded two-tier (router + aggregator) streaming-aggregation tier
- systems/himeji — centralized authorization system enforcing access at the data layer; write-time relation denormalization for fast read-time permission checks
- systems/airbnb-destination-recommendation — transformer-based
sequence model predicting user travel destinations; user actions as
tokens (summed
city + region + days-to-todayembeddings); multi-task region + city heads; serves autosuggest + abandoned-search email notifications
Key patterns / concepts¶
- concepts/separate-vs-monolithic-data-models — the core trade-off for multi-product data warehouses: per-product tables vs. unified tables; Airbnb's framework empowers each domain team to choose based on attribute commonality, enforced by three foundational principles (no hybrid models, consistent identifier naming, namespace organization)
- concepts/offline-data-warehouse-as-translation-layer — the warehouse transforms raw OLTP data into a standardized analytical source of truth
- patterns/foundational-principles-with-decentralized-guidelines — central guardrails + domain-team autonomy as an organizational scaling pattern
- concepts/embedded-workflow-engine — Skipper canonicalises the library-in-service shape of durable execution, explicitly rejecting external orchestration clusters (Temporal, Cadence, Step Functions) for Tier 0 services to avoid adding a new critical dependency
- concepts/workflow-replay-from-checkpointed-actions — Skipper's durability mechanism; state fields persisted directly, previously completed actions short-circuit on replay via stored results; no event log (trades auditability for leaner execution)
- concepts/workflow-determinism-requirement — the correctness invariant Skipper's replay model requires (side effects must live in actions, never in the workflow method)
- concepts/workflow-compensation-action — Skipper's
@Compensateannotation elevates saga-style compensating actions to a first-class programming primitive with reverse-order orchestration - concepts/workflow-signal — Skipper's
@SignalMethodfor externally-mutating workflow state while the workflow hibernates onwaitUntil { cond } - patterns/workflow-primitives-as-annotated-classes — Skipper's
5-annotation contract (
@WorkflowMethod/@StateField/@SignalMethod/@Execute(checkpoint = true)/@Compensate) as a minimal, codegen-free, DSL-free workflow definition surface - patterns/delayed-timeout-task-as-crash-safety-net — Skipper's happy-path-near-zero-overhead mechanism: 2 DB writes at start, batched checkpoints, scheduled timeout task fires harmlessly on completion or triggers replay on crash
- patterns/saga-over-long-transaction — the distributed-systems
pattern Skipper's
@Compensateannotation operationalises - patterns/staged-rollout — first-class platform feature in Sitar (env / zone / pod-%)
- patterns/sidecar-agent — per-pod config fetcher with local cache
- patterns/git-based-config-workflow — PRs as the default config change path; emergency portal as override
- concepts/control-plane-data-plane-separation — explicit "decide" vs "deliver" split in Sitar
- concepts/observability — own-the-interaction-layer thesis; vendor pricing / feedback-loop motivations for going in-house
- concepts/metric-type-metadata —
_otel_metric_type_-driven engine that replaces Prometheus naming-based type inference - patterns/intent-preserving-query-translation — map query intent (e.g., canonical histogram for any p95), not literal queries
- patterns/alerts-as-code — Reliability XP alert framework with autocomplete, backtesting, and diffing
- patterns/alert-backtesting — replay proposed alerts against historical metric data at PR-diff granularity, with "noisiness" scoring + per-alert inspection; hooks into Prometheus's rule manager
- patterns/achievable-target-first-migration — start migration with a tractable, well-aligned service, not the hardest one
- concepts/identity-decoupling — User ID vs. per-context Profile IDs as a privacy primitive; different types, not just different values
- concepts/least-privileged-access — enforced at the data layer via Himeji, not bolted on per endpoint
- patterns/audit-then-refactor-migration — audit scripts → team ownership map → manual review → AI-assisted refactor → type safety, used for the User/Profile ID migration
- patterns/dual-write-migration — shared metrics library dual-emits StatsD + OTLP to migrate ~40% of services with one config change
- patterns/zero-injection-counter — vmagent tweak that fixes
Prometheus
rate()undercounting of sparse counters - concepts/streaming-aggregation — in-transit metric aggregation (vmagent routers + aggregators) to collapse per-instance cardinality before storage
- concepts/metric-temporality — delta vs. cumulative; Airbnb moved top-cardinality emitters to delta to bound SDK memory
- concepts/user-action-as-token — language-modeling framing for recommendation: chronological user actions as transformer tokens; per-action embedding = sum of attribute embeddings (city / region / days-to-today)
- patterns/active-dormant-user-training-split — generate N+M training examples per positive outcome — N recent with full history, M dormant with long-term history only — to keep a single model accurate for both recently-active and long-dormant users (Airbnb: 14 examples per booking = 7 active + 7 dormant)
- patterns/hierarchical-multitask-geo-prediction — attach multiple prediction heads at different geographic-hierarchy levels (region + city) and train jointly so the encoder learns the taxonomy via auxiliary-task regularization
- concepts/shuffle-sharding — randomly-chosen subset of backend nodes per tenant so a bad tenant's blast radius is their K-node shuffle set, not the whole fleet; used on both write and read paths of the metrics storage system
- concepts/active-multi-cluster-blast-radius — Airbnb runs dedicated + application clusters in parallel so cluster-scoped failures affect at most 1/N of tenants
- concepts/cross-cluster-federated-query-cost — Airbnb measured cross-cluster queries at 5–10× the cost of single-cluster ones; forced adjustments in tenant-consolidation strategy around hot read patterns
- patterns/tenant-per-application — reject tenant-per-team (ownership changes frequently) for tenant-per-service (stable, attributable, ready for chargeback); ~1,000 services = ~1,000 tenants
- patterns/progressive-cluster-rollout — rollout sequence by criticality (test → internal → application → infrastructure), with infrastructure last so observability remains intact through upstream regressions
- patterns/multi-tenant-graphql-runtime — Viaduct's contribution to the wiki: one shared GraphQL runtime hosts many independently developed and tested tenant modules; the third topology for decentralized GraphQL schema development alongside UBFF (one service, one module) and Federation (many services, one module each). Complementary to Federation: Viaduct instances can participate as Federation subgraphs.
- patterns/module-based-graphql-decentralization — Viaduct's contribution model: distribute schema development through modules-in-runtime rather than subgraphs-in-services. A module = directory + SDL + resolvers; no per-team server.
- patterns/api-stability-annotations — Viaduct 1.0's OSS-
readiness discipline:
@StableApi/@ExperimentalApi/@InternalApiannotations across all public surfaces + Kotlin's binary compatibility validator in CI. - concepts/data-oriented-service-mesh — Viaduct's self- description; the schema graph (not the network proxy graph) is the mesh substrate. Disambiguates from Envoy/Istio's RPC-oriented service mesh.
- concepts/decentralized-development-of-central-schema — the problem space all three GraphQL topologies attack; central schema for discoverability + governance, decentralized development so domain experts can ship.
Recent articles¶
- 2026-06-09 — sources/2026-06-09-airbnb-scaling-beyond-one-data-architecture (Patrick Lam, Namrata Lamba, Jamie Stober on "Scaling beyond one: How Airbnb evolved its data architecture for a multi-product world" — framework for evolving a decade-old offline data warehouse from single-product (Homes) to three-product (Homes, Experiences, Services). Three foundational principles: no hybrid models, consistent identifier naming, namespace organization. Domain-driven modeling choice: product-facing domains chose separate models (listings, availability, location, guests), cross-cutting domains chose monolithic (messaging, payments, support). Data debt migration via dual-pipeline deprecation.)
- 2026-05-19 — sources/2026-05-19-airbnb-scaling-identity-graph-unified-knowledge-graph-infrastructure (Lucen Zhao, Shukun Yang, Ashish Jain on "Scaling Airbnb's identity graph with a unified knowledge graph infrastructure" — internal multi-tenant graph platform on JanusGraph + DynamoDB replacing third-party vendor. Identity graph at 7B nodes / 11B edges / 5M edges per day / 4–8 hop queries. Migration delivered 10× write QPS, significant P99 latency reduction, elimination of periodic reboots. Key optimizations: DynamoDB conditional-write transactions, parallel getMultiSlices, client-side Gremlin query rewriting, distributed tracing integration.)
- 2026-05-13 — sources/2026-05-13-airbnb-viaduct-1-0-and-the-future-of-airbnbs-data-mesh (Ryan Tanner, Raymie Stata, Adam Miskiewicz on "Viaduct 1.0 and the future of Airbnb's data mesh" — OSS 1.0 release of Viaduct, Airbnb's GraphQL-based data-oriented service mesh used internally for years. Architectural contribution: third topology for decentralized development of a central GraphQL schema alongside UBFF (one service, one module) and Apollo Federation (many services, one module each): Viaduct is few runtimes, many tenant modules per runtime. Load-bearing framing quote: "Federation distributes development by distributing servers. Viaduct distributes development by distributing modules." Tenant module contract is intentionally minimal: directory + SDL
- resolvers; the platform handles execution / scaling /
integration. Complementary to Federation, not
alternative: a Viaduct instance can participate as a
subgraph in a federated supergraph, so a "large
organization where hundreds of teams contribute to the
overall graph" can run "a smaller number of Viaduct
instances, each hosting many closely related tenant
modules" and let Federation compose them — collapsing
per-team server cost (factor of M for M modules per
instance) while preserving cross-org composition. OSS
1.0 readiness substrate:
@StableApi/@ExperimentalApi/@InternalApiannotations across all public surfaces + Kotlin binary compatibility validator in CI + Maven Central publication + Dokka-generated API docs + open community RFC process (first instance: the Connections RFC on GitHub). GraphQLConf 2026 talk teasers signpost forthcoming engineering-retrospective content on multi-tenant gateway observability with built-in ownership tags + cost-aware tracing (Vickey Yeh), gateway sharding for blast-radius reduction (Linquan Zhang & Cetin Sahin), probabilistic correctness testing on Viaduct (James Bellenger), and LLM-driven@generateMockdata generation (Michael Rebello) — ingestion candidates when the recordings/write-ups appear.) - 2026-05-05 — sources/2026-05-05-airbnb-monitoring-reliably-at-scale (Abdurrahman J. Allawala on "Monitoring reliably at scale" — Airbnb's Observability team breaks circular dependencies in its metrics platform along three axes: (1) dedicated- but-managed Kubernetes clusters for observability workloads (concepts/dedicated-but-managed-infrastructure
-
patterns/dedicated-observability-kubernetes-clusters) — "just right" middle option between shared-production (couples observability to its targets) and self-run K8s (too much ops burden on the small team); Cloud team still administers; coordinated-change discipline enforced; (2) custom Envoy L7 ingress tier for telemetry, independent of the shared Istio mesh (patterns/custom-l7-proxy-for-telemetry-over-service-mesh), with header-based tenant routing mapping ~1,000 services → cluster backends; motivated by "orders of magnitude more observability traffic than business traffic" + circular dependency of mesh-metrics on the mesh + two-way noisy-neighbour hazard with Airbnb.com traffic; adds an eighth Envoy role on the wiki (telemetry-ingress); extensibility hooks for metric mirroring + fine-grained ACLs; (3) meta-monitoring (concepts/meta-monitoring) — dedicated systems/prometheus + systems/alertmanager HA pairs pinned to nodes/AZs disjoint from the primary stack with pair- level anti-affinity; terminated by a dead-man's switch (patterns/heartbeat-absence-as-alert-trigger) — always- firing alert → SNS → CloudWatch rate alarm on an AWS control plane distinct from the K8s-hosted stack; design bar stated verbatim: "treat monitoring as a production system whose availability must exceed that of what it observes." Compute-vs-networking own-vs-adopt asymmetry articulated explicitly — Kubernetes adopted because the shared foundation fits; networking owned because telemetry's requirements (prioritisation / isolation / custom routing) diverge from what a business-traffic-shaped mesh can cleanly provide.)
-
2026-04-28 — sources/2026-04-28-airbnb-skipper-building-airbnbs-embedded-workflow-engine (Skipper: embedded Java/Kotlin workflow engine for durable execution; library-in-service shape rather than external orchestration cluster (concepts/embedded-workflow-engine); shares host service's DB for workflow state (MySQL / UDS / DynamoDB); 5-annotation programming model (patterns/workflow-primitives-as-annotated-classes); state-field replay not event history; near-zero happy-path overhead via delayed timeout task; determinism invariant + at-least-once action execution;
@Compensatereverse-order walk-back elevates saga compensations to first-class primitive (concepts/workflow-compensation-action); signals via@SignalMethod+ durablewaitUntil { cond }; in production >1 year across 15+ teams — peak 10 000 workflows / second on DynamoDB; multi-hour Media Foundation video-processing jobs survive pod restarts; Infrastructure team uses it for durable Flink job lifecycle; explicit rejection of external clusters for Tier 0 services to avoid new critical dependencies) - 2026-04-21 — sources/2026-04-21-airbnb-building-a-fault-tolerant-metrics-storage-system (storage-plane deep-dive: 50M samples/sec, 1.3B series, 2.5 PB; shuffle-sharding for per-tenant read/write isolation; tenant-per-application for ~1,000 services; single-cluster reliability → multi-cluster federation via custom Promxy; progressive cluster rollout for >99.9% availability; 5–10× federated-query cost tax; Grafana K8s rollout operators replacing multi-day manual deploys; clusters as cattle not pets)
- 2026-04-16 — sources/2026-04-16-airbnb-statsd-to-otel-metrics-pipeline (StatsD → OTLP migration via shared-library dual-write; two-tier vmagent streaming aggregation at 100M+ samples/sec; delta temporality for top emitters; zero-injection for sparse counters)
- 2026-04-14 — sources/2026-04-14-airbnb-privacy-first-connections (privacy-first identity model for social Experiences: User ID ↔ many context-scoped Profile IDs, Himeji authorization with write-time relation denormalization, AI-assisted audit+refactor migration)
- 2026-03-17 — sources/2026-03-17-airbnb-observability-ownership-migration (5-year vendor → in-house Prometheus/PromQL metrics migration; intent-preserving translation, metadata engine, alerts-as-code, own the interaction layer)
- 2026-03-04 — sources/2026-03-04-airbnb-alert-backtesting-change-reports
(deep-dive on the Reliability XP alert-authoring platform: local-first
dev + Change Reports + bulk alert
backtesting hooking Prometheus's
rules/manager.go; per-backtest K8s pod isolation; 300K alerts migrated, 90% alert-noise reduction, month → afternoon iteration cycle) - 2026-03-12 — sources/2026-03-12-airbnb-destination-recommendation-transformer
(transformer-based destination recommendation model; user actions as
tokens with summed
city + region + days-to-todayembeddings; 14 training examples per booking = 7 active + 7 dormant to balance short-term and long-term intent; multi-task region + city heads to inject geolocation hierarchy; autosuggest + abandoned-search email applications, A/B wins in non-English-primary regions) - 2026-02-18 — sources/2026-02-18-airbnb-sitar-dynamic-configuration (Sitar: dynamic config platform architecture)