Skip to content

CONCEPT Cited by 1 source

Logical vs physical sharding

Logical sharding and physical sharding are two different states in the horizontal-sharding rollout. Decoupling them — doing logical before physical, and being able to roll back each independently — is the central de-risking move in Figma's in-house horizontal-sharding effort (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Definitions

  • Logical sharding: the application layer issues reads and writes as if the table is horizontally sharded (sharded connection poolers, shard-key routing, sharded-query-subset semantics, per-shard query execution) — but the data is still physically on a single database host.
  • Physical sharding: the physical 1→N failover that moves data from one host to N hosts. Real shard infrastructure, real network partitions to worry about, real replication lag to manage.

Why decouple

Risk axis Logical rollout Physical rollout
Reliability / latency / consistency visible to users Same behavior as post-shard (so you can observe real impact) 1→N failover; new failure modes like partial-shard-op success
Rollback Configuration change back to the main table, seconds Possible, but requires complex cross-shard coordination
Percentage rollout Feature-flag-gated in DBProxy, arbitrary percentage Binary (a given table is either pre- or post-shard)
Data movement None Full replication copy to N hosts
Developer time to fix a bug Ship a query-engine fix, re-enable Potentially involves coordinating replication state

Rolling out logical first means the riskiest behavioral changes (sharded query execution, scatter-gather plans, topology lookups, DBProxy query-engine bugs) get caught and fixed on an unsharded physical layout — so when the physical failover happens, it's running code that's already been burned in under real traffic.

Implementation — Postgres views as the logical-shard representation

Figma explored three representations:

  1. Separate Postgres databases per logical shard — required the physical data move up front ⇒ no decoupling, so rejected.
  2. Separate Postgres schemas per logical shard — same problem as (1) ⇒ rejected.
  3. Postgres views per logical shard (chosen): CREATE VIEW table_shard1 AS SELECT * FROM table WHERE hash(shard_key) >= min_range AND hash(shard_key) < max_range; for each shard. All reads and writes go through the views via per-shard connection poolers — which still point at the same unsharded physical instance. Behavior is sharded end-to-end; the data hasn't moved.

Canonical patterns/sharded-views-over-unsharded-db. Performance risks of views (plan-change under the query planner) validated against a sanitized production query corpus + a shadow-reads framework comparing view-based vs direct-table execution → confirmed <10% worst-case overhead.

Physical sharding comes after

Once the logical-sharded topology has been trusted in production (feature-flag rollout to 100%, enough time for cross-shard edge cases to surface), the physical failover is the only remaining risk. Much of the 1→N failover reuses the 1→1 failover machinery built for concepts/vertical-partitioning — so Figma's years of vertical-partitioning experience accelerated this step. First physical failover: 10 seconds of partial primary availability, no replica impact (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Sibling pattern: patterns/shadow-migration

Both ideas run new semantics against old infrastructure to de-risk. Difference in orientation:

  • Shadow migration: old and new engines both execute; new engine's output is compared to old's.
  • Logical-vs-physical sharding: only the new engine executes, but on old infrastructure — the infrastructure is the shadow, not a parallel engine.

Seen in

Last updated · 200 distilled / 1,178 read