Social graph (objects and edges)¶

CONCEPT

Definition¶

The social graph as objects + edges is the canonical relational encoding of a social-media-style entity-and-relation model, where:

objects holds the entities — users, posts, pictures, pages, comments, videos, any first-class thing the product exposes.
edges holds the relations — likes, shares, friendships, follows, reactions — as a many-to-many junction table linking objects rows to other objects rows.

Canonical definition from : "The schema for TAOBench is straightforward: it consists of 2 tables, one called objects and one called edges, concepts that loosely translate to the social graph of entities (think 'users', 'posts', 'pictures', etc.) and to the various types of relationships they have with each other (think 'likes', 'shares', 'friendships', etc.). In simple relational database terms: The edges table can be viewed as a 'many-to-many' relationship table that links rows in objects to other rows in objects."

Why it matters¶

This encoding is the reduction of the social graph to a shape that any relational database can serve, orthogonal to graph-native APIs like Meta's TAO. The two-table shape is load-bearing:

Entities become rows in objects. Every first-class concept the product surfaces gets one row per instance, with a typed attribute payload.
Relations become rows in edges. Every interaction (user-likes-post, user-follows-user, post-tagged-with-location) becomes one row linking two objects rows — typed by relation kind, optionally carrying a payload (timestamp, data blob, reaction flavour).
Access patterns follow the graph shape. Reads are object-centred ("give me this user") or edge-centred ("give me the last N likes on this post"), with the latter being the read-heavy axis.

Canonical instance: TAOBench¶

TAOBench — the VLDB 2022 benchmark by Audrey Cheng and Meta engineers — uses exactly this two-table schema to synthesise Meta's TAO workload into a form any relational database can be benchmarked against. TAOBench ships two workload profiles (Workload A = application-transactional subset; Workload O = overall TAO profile), each with its own statistical distribution baked into the load phase.

The two-table shape is also the stress-point for viral-content workloads: when a post goes viral, a single objects row becomes a hot row (concepts/hot-row-problem), and all the edges rows written against it create a concepts/thundering-herd of writes to the backing MySQL primary for that shard. Benchmarks that use this schema are explicitly testing substrate behaviour under these stressors, as distinct from shard-key-aligned workloads like sysbench-tpcc that have no hot rows by construction.

Contrast with other encodings¶

Encoding	Shape	Primary use
Objects + edges (this concept)	2 relational tables	Relational DB workloads; TAOBench; generic social-graph apps on SQL
Property graph	Typed nodes + typed edges + properties	Neo4j, native graph DBs; graph traversal queries
Triple store	(subject, predicate, object) triples	RDF / SPARQL / knowledge graphs
Adjacency list	List of neighbours per node	Graph algorithms; in-memory graphs
Sparse matrix	Rows × columns with sparse non-zero entries	Graph-ML; matrix-factorisation approaches
Meta TAO API	Objects + associations as first-class API primitives	Meta's production social graph — the source the `objects`+`edges` schema encodes into relational

The objects + edges encoding is the lowest-common-denominator relational shape — readable by any SQL engine, without graph-native features — which is precisely what TAOBench needs to benchmark any relational database.

Operational consequences¶

Every relation creates an edge row. High-activity relations (likes, views, reactions) drive write volume at the edges table.
Hot objects rows track entity popularity. Viral posts, celebrity users, and trending pages all concentrate workload on single objects rows — the hot-row problem at relational-DB level.
edges tends to dwarf objects. N entities produce up to N² edges in theory; in practice, the skew is long-tailed (most entities have few relations; a few have many). This drives the shape of scaling decisions for social-graph databases — edges is where the data volume and the write hotspots live.
Fan-out reads dominate. Fetching "N most recent likes on this post" is an edges range scan — the TAOBench bulk-reads phase is deliberately punishing on exactly this access pattern.

Seen in¶

— Van Dijk (PlanetScale, 2022-09-08) canonicalises the objects
edges schema as the relational encoding of the social graph for TAOBench. The post explicitly frames the schema as a viral-content stressor: "Focusing the workload around these two simplified concepts allows the benchmark to simulate typical 'hot row' scenarios that can be particularly challenging for relational databases to handle."

systems/taobench — the canonical benchmark built on this schema.
systems/meta-tao — the production system whose workload TAOBench encodes.
concepts/hot-row-problem, concepts/thundering-herd — the stressors this schema exposes.
concepts/benchmark-representativeness — the property TAOBench claims via this schema + workload profiles.