CONCEPT Cited by 1 source

Logical vs physical sharding¶

Logical sharding and physical sharding are two different states in the horizontal-sharding rollout. Decoupling them — doing logical before physical, and being able to roll back each independently — is the central de-risking move in Figma's in-house horizontal-sharding effort (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Definitions¶

Logical sharding: the application layer issues reads and writes as if the table is horizontally sharded (sharded connection poolers, shard-key routing, sharded-query-subset semantics, per-shard query execution) — but the data is still physically on a single database host.
Physical sharding: the physical 1→N failover that moves data from one host to N hosts. Real shard infrastructure, real network partitions to worry about, real replication lag to manage.

Why decouple¶

Risk axis	Logical rollout	Physical rollout
Reliability / latency / consistency visible to users	Same behavior as post-shard (so you can observe real impact)	1→N failover; new failure modes like partial-shard-op success
Rollback	Configuration change back to the main table, seconds	Possible, but requires complex cross-shard coordination
Percentage rollout	Feature-flag-gated in DBProxy, arbitrary percentage	Binary (a given table is either pre- or post-shard)
Data movement	None	Full replication copy to N hosts
Developer time to fix a bug	Ship a query-engine fix, re-enable	Potentially involves coordinating replication state

Rolling out logical first means the riskiest behavioral changes (sharded query execution, scatter-gather plans, topology lookups, DBProxy query-engine bugs) get caught and fixed on an unsharded physical layout — so when the physical failover happens, it's running code that's already been burned in under real traffic.

Implementation — Postgres views as the logical-shard representation¶

Figma explored three representations:

Separate Postgres databases per logical shard — required the physical data move up front ⇒ no decoupling, so rejected.
Separate Postgres schemas per logical shard — same problem as (1) ⇒ rejected.
Postgres views per logical shard (chosen): CREATE VIEW table_shard1 AS SELECT * FROM table WHERE hash(shard_key) >= min_range AND hash(shard_key) < max_range; for each shard. All reads and writes go through the views via per-shard connection poolers — which still point at the same unsharded physical instance. Behavior is sharded end-to-end; the data hasn't moved.

Canonical patterns/sharded-views-over-unsharded-db. Performance risks of views (plan-change under the query planner) validated against a sanitized production query corpus + a shadow-reads framework comparing view-based vs direct-table execution → confirmed <10% worst-case overhead.

Physical sharding comes after¶

Once the logical-sharded topology has been trusted in production (feature-flag rollout to 100%, enough time for cross-shard edge cases to surface), the physical failover is the only remaining risk. Much of the 1→N failover reuses the 1→1 failover machinery built for concepts/vertical-partitioning — so Figma's years of vertical-partitioning experience accelerated this step. First physical failover: 10 seconds of partial primary availability, no replica impact (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Sibling pattern: patterns/shadow-migration ¶

Both ideas run new semantics against old infrastructure to de-risk. Difference in orientation:

Shadow migration: old and new engines both execute; new engine's output is compared to old's.
Logical-vs-physical sharding: only the new engine executes, but on old infrastructure — the infrastructure is the shadow, not a parallel engine.

Seen in¶

sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — Figma's canonical instance: per-shard Postgres views against an unsharded physical database + DBProxy with a feature-flagged percentage rollout of sharded semantics; shadow reads quantify view overhead.