SYSTEM Cited by 1 source
Meta TAO¶
What it is¶
TAO (The Associations and Objects) is Meta's production
social-graph storage system, originally introduced at
USENIX ATC 2013
("TAO: Facebook's Distributed Data Store for the Social Graph").
TAO exposes objects (entities like users, posts, pictures,
pages, comments) and associations (typed relations between
entities, like LIKES, FRIEND, AUTHORED_BY, TAGGED) as
first-class API primitives, backed by a MySQL sharded substrate
with a geo-distributed caching tier in front.
Why it shows up on this wiki¶
TAO is referenced on this wiki primarily as the workload source for TAOBench — the open-source benchmark by Audrey Cheng (UC Berkeley) and Meta engineers that synthesises TAO's access patterns into a runnable workload for third-party databases. TAOBench is introduced on the wiki via PlanetScale's TAOBench post (Liz van Dijk, 2022-09-08); this page exists primarily to anchor TAO as the origin of the social-graph workload shape being benchmarked.
Two VLDB papers from Cheng's team are the authoritative workload characterisation:
- VLDB 2021 — "Workload Analysis of a Large-Scale Key-Value Store" — characterises TAO's access patterns (read-heavy, long-tail popularity, high write-fanout on viral content).
- VLDB 2022 — "TAOBench: An End-to-End Benchmark for Social Network Workloads" — proposes TAOBench as a benchmarkable synthesis of the TAO workload.
Minimal-viable page for now: TAO has enough architectural substance (MySQL-backed sharded social graph, geo-distributed caching tier, API-level objects + associations abstraction, read-heavy workload characterisation) to deserve a deeper treatment if a Meta engineering post about its internals is ingested in future. Until then this page exists to anchor cross-references from TAOBench and from social-graph-related wiki concepts.
Architectural shape¶
- API: objects (typed entities with IDs + attribute blobs) + associations (typed binary relations with an optional time + data payload).
- Read path: geo-distributed caching tier fronts the MySQL backing store. Most reads hit cache; writes go through to the backing MySQL master for the shard.
- Write fan-out: a single write against an association cascades to cache invalidation across regions; hot associations (viral posts) fan out widely.
- Consistency: eventual across regions, with read-your-writes within a region. Not serialisable.
- Workload skew: read-heavy (~99.8% reads per Cheng's VLDB 2021 characterisation), long-tail popularity, burst write-fanout on viral content. These properties drive TAOBench's design.
Canonical relational encoding¶
When TAO's workload is ported to a relational substrate (for benchmarking or for non-Meta systems modelling social-graph-shaped workloads), the canonical encoding is concepts/social-graph-objects-and-edges:
- An
objectstable — one row per entity. - An
edgestable — one row per relation, linking twoobjectsrows via a many-to-many junction.
TAOBench uses exactly this two-table shape.
Seen in¶
- sources/2026-04-21-planetscale-taobench-running-social-media-workloads-on-planetscale — Van Dijk references TAO as the workload origin for TAOBench: "Workload A (short for Application) focuses specifically on a transactional subset of the queries. Workload O (short for Overall) encompasses a more generalized profile of the TAO workload." The post does not disclose TAO internals beyond naming it as the upstream workload source — for that depth, the VLDB papers and the 2013 USENIX ATC paper are the canonical sources.
Related¶
- systems/taobench — the benchmark that synthesises TAO's workload.
- concepts/social-graph-objects-and-edges — the relational encoding of TAO's object-association model.
- concepts/hot-row-problem — the substrate-side problem that viral-content-on-TAO creates for any backing store.
- concepts/thundering-herd — the read-burst shape on viral content.
- companies/meta — the company operating TAO in production.