CONCEPT Cited by 1 source
Social graph (objects and edges)¶
Definition¶
The social graph as objects + edges is the canonical relational encoding of a social-media-style entity-and-relation model, where:
objectsholds the entities — users, posts, pictures, pages, comments, videos, any first-class thing the product exposes.edgesholds the relations — likes, shares, friendships, follows, reactions — as a many-to-many junction table linkingobjectsrows to otherobjectsrows.
Canonical definition from PlanetScale's TAOBench post:
"The schema for TAOBench is straightforward: it consists of 2
tables, one called objects and one called edges, concepts that
loosely translate to the social graph of entities (think 'users',
'posts', 'pictures', etc.) and to the various types of relationships
they have with each other (think 'likes', 'shares', 'friendships',
etc.). In simple relational database terms: The edges table can
be viewed as a 'many-to-many' relationship table that links rows in
objects to other rows in objects."
Why it matters¶
This encoding is the reduction of the social graph to a shape that any relational database can serve, orthogonal to graph-native APIs like Meta's TAO. The two-table shape is load-bearing:
- Entities become rows in
objects. Every first-class concept the product surfaces gets one row per instance, with a typed attribute payload. - Relations become rows in
edges. Every interaction (user-likes-post, user-follows-user, post-tagged-with-location) becomes one row linking twoobjectsrows — typed by relation kind, optionally carrying a payload (timestamp, data blob, reaction flavour). - Access patterns follow the graph shape. Reads are object-centred ("give me this user") or edge-centred ("give me the last N likes on this post"), with the latter being the read-heavy axis.
Canonical instance: TAOBench¶
TAOBench — the VLDB 2022 benchmark by Audrey Cheng and Meta engineers — uses exactly this two-table schema to synthesise Meta's TAO workload into a form any relational database can be benchmarked against. TAOBench ships two workload profiles (Workload A = application-transactional subset; Workload O = overall TAO profile), each with its own statistical distribution baked into the load phase.
The two-table shape is also the stress-point for viral-content
workloads: when a post goes viral, a single objects row becomes
a hot row (concepts/hot-row-problem), and all the edges rows
written against it create a concepts/thundering-herd of writes
to the backing MySQL primary for that shard. Benchmarks that use
this schema are explicitly testing substrate behaviour under these
stressors, as distinct from shard-key-aligned workloads like
sysbench-tpcc that have no hot rows by construction.
Contrast with other encodings¶
| Encoding | Shape | Primary use |
|---|---|---|
| Objects + edges (this concept) | 2 relational tables | Relational DB workloads; TAOBench; generic social-graph apps on SQL |
| Property graph | Typed nodes + typed edges + properties | Neo4j, native graph DBs; graph traversal queries |
| Triple store | (subject, predicate, object) triples | RDF / SPARQL / knowledge graphs |
| Adjacency list | List of neighbours per node | Graph algorithms; in-memory graphs |
| Sparse matrix | Rows × columns with sparse non-zero entries | Graph-ML; matrix-factorisation approaches |
| Meta TAO API | Objects + associations as first-class API primitives | Meta's production social graph — the source the objects+edges schema encodes into relational |
The objects + edges encoding is the lowest-common-denominator
relational shape — readable by any SQL engine, without graph-native
features — which is precisely what TAOBench needs to benchmark
any relational database.
Operational consequences¶
- Every relation creates an edge row. High-activity relations
(likes, views, reactions) drive write volume at the
edgestable. - Hot
objectsrows track entity popularity. Viral posts, celebrity users, and trending pages all concentrate workload on singleobjectsrows — the hot-row problem at relational-DB level. edgestends to dwarfobjects. N entities produce up to N² edges in theory; in practice, the skew is long-tailed (most entities have few relations; a few have many). This drives the shape of scaling decisions for social-graph databases —edgesis where the data volume and the write hotspots live.- Fan-out reads dominate. Fetching "N most recent likes on
this post" is an
edgesrange scan — the TAOBench bulk-reads phase is deliberately punishing on exactly this access pattern.
Seen in¶
- sources/2026-04-21-planetscale-taobench-running-social-media-workloads-on-planetscale
— Van Dijk (PlanetScale, 2022-09-08) canonicalises the
objects edgesschema as the relational encoding of the social graph for TAOBench. The post explicitly frames the schema as a viral-content stressor: "Focusing the workload around these two simplified concepts allows the benchmark to simulate typical 'hot row' scenarios that can be particularly challenging for relational databases to handle."
Related¶
- systems/taobench — the canonical benchmark built on this schema.
- systems/meta-tao — the production system whose workload TAOBench encodes.
- concepts/hot-row-problem, concepts/thundering-herd — the stressors this schema exposes.
- concepts/benchmark-representativeness — the property TAOBench claims via this schema + workload profiles.