Skip to content

SYSTEM Cited by 1 source

Meta TAO

What it is

TAO (The Associations and Objects) is Meta's production social-graph storage system, originally introduced at USENIX ATC 2013 ("TAO: Facebook's Distributed Data Store for the Social Graph"). TAO exposes objects (entities like users, posts, pictures, pages, comments) and associations (typed relations between entities, like LIKES, FRIEND, AUTHORED_BY, TAGGED) as first-class API primitives, backed by a MySQL sharded substrate with a geo-distributed caching tier in front.

Why it shows up on this wiki

TAO is referenced on this wiki primarily as the workload source for TAOBench — the open-source benchmark by Audrey Cheng (UC Berkeley) and Meta engineers that synthesises TAO's access patterns into a runnable workload for third-party databases. TAOBench is introduced on the wiki via PlanetScale's TAOBench post (Liz van Dijk, 2022-09-08); this page exists primarily to anchor TAO as the origin of the social-graph workload shape being benchmarked.

Two VLDB papers from Cheng's team are the authoritative workload characterisation:

  • VLDB 2021"Workload Analysis of a Large-Scale Key-Value Store" — characterises TAO's access patterns (read-heavy, long-tail popularity, high write-fanout on viral content).
  • VLDB 2022"TAOBench: An End-to-End Benchmark for Social Network Workloads" — proposes TAOBench as a benchmarkable synthesis of the TAO workload.

Minimal-viable page for now: TAO has enough architectural substance (MySQL-backed sharded social graph, geo-distributed caching tier, API-level objects + associations abstraction, read-heavy workload characterisation) to deserve a deeper treatment if a Meta engineering post about its internals is ingested in future. Until then this page exists to anchor cross-references from TAOBench and from social-graph-related wiki concepts.

Architectural shape

  • API: objects (typed entities with IDs + attribute blobs) + associations (typed binary relations with an optional time + data payload).
  • Read path: geo-distributed caching tier fronts the MySQL backing store. Most reads hit cache; writes go through to the backing MySQL master for the shard.
  • Write fan-out: a single write against an association cascades to cache invalidation across regions; hot associations (viral posts) fan out widely.
  • Consistency: eventual across regions, with read-your-writes within a region. Not serialisable.
  • Workload skew: read-heavy (~99.8% reads per Cheng's VLDB 2021 characterisation), long-tail popularity, burst write-fanout on viral content. These properties drive TAOBench's design.

Canonical relational encoding

When TAO's workload is ported to a relational substrate (for benchmarking or for non-Meta systems modelling social-graph-shaped workloads), the canonical encoding is concepts/social-graph-objects-and-edges:

  • An objects table — one row per entity.
  • An edges table — one row per relation, linking two objects rows via a many-to-many junction.

TAOBench uses exactly this two-table shape.

Seen in

  • sources/2026-04-21-planetscale-taobench-running-social-media-workloads-on-planetscale — Van Dijk references TAO as the workload origin for TAOBench: "Workload A (short for Application) focuses specifically on a transactional subset of the queries. Workload O (short for Overall) encompasses a more generalized profile of the TAO workload." The post does not disclose TAO internals beyond naming it as the upstream workload source — for that depth, the VLDB papers and the 2013 USENIX ATC paper are the canonical sources.
Last updated · 550 distilled / 1,221 read