SYSTEM

Meta TAO¶

What it is¶

TAO (The Associations and Objects) is Meta's production social-graph storage system, originally introduced at USENIX ATC 2013 ("TAO: Facebook's Distributed Data Store for the Social Graph"). TAO exposes objects (entities like users, posts, pictures, pages, comments) and associations (typed relations between entities, like LIKES, FRIEND, AUTHORED_BY, TAGGED) as first-class API primitives, backed by a MySQL sharded substrate with a geo-distributed caching tier in front.

Why it shows up on this wiki¶

TAO is referenced on this wiki primarily as the workload source for TAOBench — the open-source benchmark by Audrey Cheng (UC Berkeley) and Meta engineers that synthesises TAO's access patterns into a runnable workload for third-party databases. TAOBench is introduced on the wiki via

(Liz van Dijk, 2022-09-08); this page exists primarily to anchor TAO as the origin of the social-graph workload shape being benchmarked.

Two VLDB papers from Cheng's team are the authoritative workload characterisation:

VLDB 2021 — "Workload Analysis of a Large-Scale Key-Value Store" — characterises TAO's access patterns (read-heavy, long-tail popularity, high write-fanout on viral content).
VLDB 2022 — "TAOBench: An End-to-End Benchmark for Social Network Workloads" — proposes TAOBench as a benchmarkable synthesis of the TAO workload.

Minimal-viable page for now: TAO has enough architectural substance (MySQL-backed sharded social graph, geo-distributed caching tier, API-level objects + associations abstraction, read-heavy workload characterisation) to deserve a deeper treatment if a Meta engineering post about its internals is ingested in future. Until then this page exists to anchor cross-references from TAOBench and from social-graph-related wiki concepts.

Architectural shape¶

API: objects (typed entities with IDs + attribute blobs) + associations (typed binary relations with an optional time + data payload).
Read path: geo-distributed caching tier fronts the MySQL backing store. Most reads hit cache; writes go through to the backing MySQL master for the shard.
Write fan-out: a single write against an association cascades to cache invalidation across regions; hot associations (viral posts) fan out widely.
Consistency: eventual across regions, with read-your-writes within a region. Not serialisable.
Workload skew: read-heavy (~99.8% reads per Cheng's VLDB 2021 characterisation), long-tail popularity, burst write-fanout on viral content. These properties drive TAOBench's design.

Canonical relational encoding¶

When TAO's workload is ported to a relational substrate (for benchmarking or for non-Meta systems modelling social-graph-shaped workloads), the canonical encoding is concepts/social-graph-objects-and-edges:

An objects table — one row per entity.
An edges table — one row per relation, linking two objects rows via a many-to-many junction.

TAOBench uses exactly this two-table shape.

Seen in¶

— Van Dijk references TAO as the workload origin for TAOBench: "Workload A (short for Application) focuses specifically on a transactional subset of the queries. Workload O (short for Overall) encompasses a more generalized profile of the TAO workload." The post does not disclose TAO internals beyond naming it as the upstream workload source — for that depth, the VLDB papers and the 2013 USENIX ATC paper are the canonical sources.

systems/taobench — the benchmark that synthesises TAO's workload.
concepts/social-graph-objects-and-edges — the relational encoding of TAO's object-association model.
concepts/hot-row-problem — the substrate-side problem that viral-content-on-TAO creates for any backing store.
concepts/thundering-herd — the read-burst shape on viral content.
companies/meta — the company operating TAO in production.