Skip to content

SYSTEM Cited by 1 source

TAOBench

What it is

TAOBench is an open-source benchmark for relational and distributed databases that synthesises the workload shape of Meta's production TAO social-graph store. It was published at VLDB 2022 by Audrey Cheng and colleagues at UC Berkeley in collaboration with Meta engineers (paper: TAOBench: An End-to-End Benchmark for Social Network Workloads); the preceding VLDB 2021 paper ("Workload Analysis of a Large-Scale Key-Value Store") characterised Meta's TAO workload and motivated the benchmark.

Why it shows up on this wiki

TAOBench is introduced on the wiki via PlanetScale's Tech Solutions post TAOBench: Running social media workloads on PlanetScale (Liz van Dijk, 2022-09-08), which positions it as a social-graph-shaped complement to TPC-C / sysbench-tpcc for evaluating database substrates under workloads TPC-C doesn't cover: "The TPC-C benchmark has had a very long life, and has remained remarkably relevant until this day, but there are scenarios it doesn't cover. Audrey Cheng and her team at University of California, Berkeley identified a real gap when it comes to available synthetic benchmarks for a more recent, but highly pervasive workload type: social media networks."

Schema

Two tables — canonicalised as concepts/social-graph-objects-and-edges:

  • objects — the social-graph entities (users, posts, pictures, comments, pages).
  • edges — the many-to-many relations between entities (likes, shares, friendships, follows, reactions). The edges table is a classic many-to-many junction linking objects rows to other objects rows.

"In simple relational database terms: The edges table can be viewed as a 'many-to-many' relationship table that links rows in objects to other rows in objects." (Source: sources/2026-04-21-planetscale-taobench-running-social-media-workloads-on-planetscale.)

This is the relational encoding of the social graph — distinct from Meta's graph-native TAO API (objects + associations as first-class API primitives).

Workload profiles

Two pre-configured workload scenarios ship with TAOBench:

  • Workload A (Application) — transactional subset of the queries; concentrates on the OLTP-like access patterns within Meta's workload.
  • Workload O (Overall) — generalised profile of the full TAO workload.

Critically, the statistical distribution of data in both objects and edges is baked into the load phase, not just the query phase: "data should be reloaded when switching between them" (Source: sources/2026-04-21-planetscale-taobench-running-social-media-workloads-on-planetscale). This is a stronger representation-of-Meta coupling than sysbench-tpcc's Lua-script-level workload shape — the benchmark knows about the workload's storage shape, not just its query mix.

Three-phase protocol

TAOBench runs in three explicit phases:

  1. Load phase — bulk-insert rows into objects and edges according to the chosen workload scenario. Populates the dataset to the size dictated by the workload profile.
  2. Bulk-reads phase (unmeasured) — "very aggressive range scans across the entire dataset to serve as general 'warmup' to whichever caching mechanisms may be in place, and also aggregates the necessary statistical information to feed into the experiments themselves." Explicitly "not measured, but can be extremely punishing to the underlying infrastructure."
  3. Experiments phase (measured) — accepts predefined concurrency levels and runtime operation targets to scale the chosen workload to various infrastructure sizes.

The separation of warmup into its own unmeasured phase is a methodology improvement over single-phase benchmarks that conflate cold-cache ramp with measured steady-state. The bulk-reads phase tests range-scan capacity, which is a different substrate axis than the experiments phase's concurrency-driven point-op load — two substrate axes are exercised in one benchmark run.

Hot-row / thundering-herd as design target

TAOBench's objects + edges model is chosen in part to explicitly stress hot-row behaviour and thundering-herd response. Van Dijk's framing: "Focusing the workload around these two simplified concepts allows the benchmark to simulate typical 'hot row' scenarios that can be particularly challenging for relational databases to handle. Think of what happens when something goes viral: a thundering herd of users comes through to interact with a specific piece of content posted somewhere."

This makes TAOBench the first named benchmark on this wiki that explicitly measures substrate behaviour under viral-content skew — distinct from sysbench-tpcc, whose access patterns are shard-key-aligned (i.e., no hot rows by construction).

Positioning vs sysbench

Axis sysbench-tpcc TAOBench
Workload shape OLTP / online-retail-adjacent Social graph
Schema TPC-C derivative (warehouses, districts, customers, orders) objects + edges (many-to-many)
Access pattern Shard-key-aligned; no hot rows Skewed; explicit hot-row stress
Workload reloads Not required between scenarios Required between Workload A / O
Warmup Implicit in ramp Explicit unmeasured bulk-reads phase
Origin TPC-C academic benchmark + Percona port Meta-workload-derived, VLDB-published
Use on PlanetScale 1M-QPS single-tenant capability 48-core multi-tenant-serverless capability

The two benchmarks are intentionally complementary in PlanetScale's published benchmarking work — sysbench-tpcc for shard-linear scaling demonstration, TAOBench for social-graph-shaped substrate maturity disclosure.

Seen in

  • sources/2026-04-21-planetscale-taobench-running-social-media-workloads-on-planetscale — Liz van Dijk (PlanetScale, 2022-09-08) introduces TAOBench to the PlanetScale benchmarking arsenal. Cheng's Berkeley/Meta team independently measured PlanetScale infrastructure against TAOBench; PlanetScale then verified internally using the public benchmark code. The published PlanetScale run uses a 48-CPU core resource cap allocated as **44 cores for the query path
  • 4 cores for multi-tenant serverless overhead (edge load balancers) — see concepts/constrained-resource-benchmark for the methodology generalisation. Key takeaway van Dijk names is graceful saturation, not peak QPS: "sustained stability of PlanetScale clusters under even the most extreme resource pressure" — the benchmark-at-the-ceiling property that distinguishes mature substrates from ones that collapse past 100% CPU (concepts/graceful-saturation-vs-congestive-collapse).
Last updated · 550 distilled / 1,221 read