SYSTEM Cited by 1 source
DBProxy (Figma)¶
DBProxy is Figma's in-house Go service interposed between the application layer and PGBouncer to enable horizontal sharding of RDS Postgres. It parses SQL from the application, plans shard routing, and dispatches the resulting queries to the appropriate physical Postgres shard (or fans out to many shards for scatter-gather). Introduced late 2022; shipped the first horizontally-sharded table in September 2023 with 10 seconds of partial primary availability on the physical failover (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
Pre-DBProxy, Figma apps talked directly to PGBouncer. Horizontal sharding required sophisticated query parsing / planning / execution that PGBouncer doesn't do — DBProxy is the new layer that does it.
Components¶
Query engine¶
Three-stage pipeline:
- Query parser — reads SQL from the application, emits an Abstract Syntax Tree.
- Logical planner — walks the AST, extracts the query type (insert / update / select / …) and the logical shard IDs the query touches, based on shard-key predicates.
- Physical planner — maps logical shard IDs to physical databases via the topology library, and rewrites queries to execute on the appropriate physical shard.
Scatter-gather execution¶
Queries whose predicate doesn't contain the shard key (or that need data from multiple shards) are executed as scatter-gather: fan out to every physical shard, aggregate results back. Complex cases (cross-colo joins, complex aggregations, nested SQL) are expensive to implement and a scale cap — so DBProxy deliberately restricts the supported sharded-query language to the subset shadow-readiness analysis identified as the 90% common case without worst-case engine complexity:
- All range scans and point queries allowed.
- Joins only when joining two tables in the same colo on the shard key.
Queries outside the subset must be rewritten in product code.
Topology library¶
Encapsulates Figma's horizontal-sharding topology metadata:
table → shard keymapping.logical shard ID → (logical shard set, physical database)mapping.
Properties:
- Real-time updates in < 1 second — topology changes during shard splits, and DBProxy must reflect them fast to avoid routing to the wrong DB.
- Backwards-compatible updates only — so updates are never on the critical path for site traffic.
- Same logical topology across environments — non-prod reuses prod's logical topology against fewer physical DBs (cost + complexity savings without per-env code divergence).
- Invariants enforced at the topology level (e.g. every shard ID mapped to exactly one physical DB).
Operational features¶
- Load shedding on overloaded paths (especially scatter-gather).
- Request hedging for tail-latency mitigation against slow shards.
- Transaction support — scoped to single-shard transactions (Figma deliberately does not support atomic cross-shard transactions; cross-shard failures are handled in product logic, e.g. "moving a team between orgs" is resilient to partial-commit failures).
- Observability — per-query routing decisions, per-shard timing, scatter-gather fan-out tracking.
Relationship to the rest of the stack¶
Application
│
▼
DBProxy ← query parse + logical plan + physical plan + exec
│
▼
PGBouncer ← connection pooling (unchanged; per-shard poolers)
│
▼
RDS Postgres ← per-shard physical instance (or a sharded view on one instance during logical-shard rollout)
Pre-physical-shard rollout¶
During the logical-sharding-only phase, DBProxy routes through per-shard connection poolers that still point at one unsharded Postgres instance — each "shard" is actually a Postgres view over the shard-key-hashed subset of data. DBProxy's behavior is identical to the post-physical-shard world; the feature-flag in the query engine gates whether a table's traffic goes through sharded-views routing or falls back to the single unsharded table. Rollback = seconds.
What it isn't¶
- Not a full Postgres-compatible query engine. Figma explicitly traded SQL compatibility for implementation simplicity; if DBProxy supported every SQL feature, it "would have begun to look a lot like the Postgres database query engine."
- Not a distributed transaction coordinator. No 2PC, no distributed-SQL consistency guarantees across shards.
- Not an ORM. Applications still construct SQL; DBProxy sees the SQL text and routes it.
- Not open-sourced (as of the blog post date).
Not covered in the post¶
- Exact SQL-subset grammar (informal description only).
- Topology-update protocol (how < 1s propagation is achieved).
- Hedging / load-shedding algorithms.
- Per-shard plan aggregation for scatter-gather (how results are merged / deduped).
- Cost / CPU / p99 numbers.
- Shard-count figures.
Seen in¶
- sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — DBProxy's three-stage query engine, topology library, scatter-gather execution, feature-flagged logical-shard rollout, and integration with sharded views for pre-physical-shard de-risking.
- sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale — DBProxy is the DB-topology-aware layer between applications and PGBouncer; LiveGraph's invalidator tier is sharded the same way as the physical DBs and tails the WAL replication stream per physical shard, while LiveGraph's edge + cache tiers stay DB-topology-agnostic. DBProxy-era horizontal sharding was one of the forcing functions for LiveGraph's rebuild (the old LiveGraph assumed a single globally-ordered replication stream — a reasonable assumption in a one-primary-Postgres world, broken by vertical partitioning, unworkable under horizontal sharding).