Skip to content

Figma — How Figma's Databases Team Lived to Tell the Scale

Summary

Figma's Databases team retrospective on scaling RDS Postgres ~100× since 2020. 2020 baseline was a single Postgres on AWS's largest physical instance; by end of 2022 they had layered caching + read replicas + a dozen vertically partitioned databases (groups of related tables — "Figma files", "Organizations" — split into their own vertical partitions). In late 2022 they started a multi-year horizontal sharding effort on top of that vertically-partitioned RDS Postgres substrate. They shipped their first horizontally-sharded table in September 2023 with only 10s of partial availability on primaries and no availability impact on replicas. The architectural center is a DBProxy Go service between the application and PGBouncer (query parser + logical planner + physical planner + execution), colos (groups of related tables sharing a shard key and physical layout that still support cross-table joins + full transactions under a single shard-key value), separation of logical sharding from physical sharding (with Postgres views as the logical-shard representation, allowing percentage-rollout of sharded read/write semantics against the still-unsharded database before the risky failover), shadow application readiness against live production traffic to choose a sharded-query subset covering ~90% of queries without worst-case query-engine complexity, and full (not filtered) logical replication during reshards. Canonical "do horizontal sharding on Postgres in-house without migrating off RDS" post with the why-not-CockroachDB/TiDB/Spanner/Vitess/NoSQL reasoning made explicit.

Key takeaways

  1. Vertical partitioning buys runway cheaply; horizontal sharding is the order-of-magnitude-more-complex next step. "Vertical partitioning was a relatively easy and very impactful scaling lever that bought us significant runway quickly. It was also a stepping stone on the path to horizontal sharding." (concepts/vertical-partitioningconcepts/horizontal-sharding). The smallest unit of vertical partitioning is a single table; once individual tables exceed Postgres's practical ceilings (several TB / billions of rows / reliability-impacting vacuum durations / pushing RDS IOPS limits), vertical partitioning can't help and horizontal sharding becomes the required lever.

  2. Bottleneck spectrum beyond CPU. Early scaling had focused on Postgres CPU utilization; as the fleet grew larger and more heterogeneous, the team monitored CPU, IO, table size, and rows written and combined historical data + load-testing to quantify per-shard runway. Two specific ceilings named: (a) reliability impact during Postgres vacuums at several-TB / billions-of-rows table size — vacuums are essential background ops that keep Postgres from running out of transaction IDs; at the observed size vacuums caused user-visible impact (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale); (b) maximum RDS IOPS on highest-write tables growing fast enough to soon exceed Amazon RDS's per-instance ceiling.

  3. Why in-house sharding on RDS Postgres — not CockroachDB / TiDB / Spanner / Vitess / NoSQL. Each alternative would have required a full data migration + rebuilding operational expertise that had been tuned for years on RDS Postgres, under "only months of runway remaining" aggressive-growth pressure. NoSQL rejected on the relational-model premise ("almost our entire backend application" would have been rewritten). In-house generic horizontal sharding was rejected as re-implementing a database; instead they took advantage of being able to tailor to Figma's data model: drop atomic cross-shard transactions, pick a colocation strategy that minimized application changes, support a subset of Postgres compatible with the majority of product logic, and maintain backwards compatibility between sharded and unsharded Postgres so rollback remained possible. Canonical "known-low-risk over easier-higher-variance" trade-off (patterns/scoped-migration-with-fast-follows).

  4. Seven explicit design goals guided the effort — all of them sharpen existing wiki patterns:

  5. Minimize developer impact (most app code shouldn't have to refactor).
  6. Scale out transparently (future shard-outs touch no app code).
  7. Skip expensive backfills (given table sizes + Postgres throughput, backfills would take months).
  8. Make incremental progress (de-risk major production changes).
  9. Avoid one-way migrations (physical shard ops must be reversible).
  10. Maintain strong data consistency (no double-writes; near-zero-downtime scale-out).
  11. Play to existing strengths (leverage RDS Postgres expertise; roll out incrementally on fastest-growing tables).

  12. Colos = the developer-facing abstraction for horizontally sharded tables. Figma has no single natural sharding key across all tables (a composite key would have required adding a column + expensive backfill + refactoring on every table). Instead they picked a small set of sharding keysUserID, FileID, OrgID — each of which covers almost every table. Tables sharing a shard key are grouped into a colocation ("colo"): they share the same physical sharding layout and support cross-table joins + full transactions when restricted to a single shard-key value (patterns/colocation-sharding). Most application code already interacted with the database under a single shard-key value (most queries are scoped to one user / one file / one org), so most product code needed no refactor. Joins are only allowed when both tables are in the same colo and the join is on the sharding key. Canonical "narrow the horizontal-sharding surface area by discovering the data model's latent partitioning" idea.

  13. Hash-of-shard-key routing to avoid hotspots from auto-increment / Snowflake IDs. The chosen sharding keys (UserID, FileID, OrgID) are auto-incrementing or Snowflake-timestamp-prefixed — both would concentrate recent data on one shard. Migrating to randomized IDs would have required a data migration the team wanted to avoid. They route on hash(shard_key) with a sufficiently-random hash function for uniform distribution; downside is less-efficient range scans on shard keys (sequential keys hash to different shards), accepted because range-scan is uncommon in their codebase (concepts/shard-key).

  14. Logical sharding decoupled from physical sharding, connected by Postgres views. A new structural idea that makes horizontal sharding incrementally rollout-able: (concepts/logical-vs-physical-sharding, patterns/sharded-views-over-unsharded-db)

  15. Logical sharding = the application serves reads and writes as if the table is horizontally sharded, but data is still physically on one Postgres instance.
  16. Implementation is per-shard Postgres views: CREATE VIEW table_shard1 AS SELECT * FROM table WHERE hash(shard_key) >= min_range AND hash(shard_key) < max_range for each logical shard; all reads and writes go through these views via separate sharded connection poolers pointing at the same physical instance.
  17. This gave a safe percentage-based rollout of sharded semantics (reliability, latency, consistency all behave as post-shard) that rolls back to the main table in seconds via feature flags if anything goes wrong.
  18. Physical sharding = the subsequent 1→N failover that splits data across multiple Postgres instances. Much riskier; rollback is possible but requires complex coordination. Performed only once logical sharding is fully trusted.
  19. Alternative representations explored (separate Postgres databases / Postgres schemas) were rejected because logical sharding in those representations would require the same physical data changes as the physical shard split itself — no decoupling.
  20. Views' risks (perf overhead, query-planner plan changes) were quantified with a collected sanitized production query corpus + shadow reads framework — confirmed < 10% worst-case view overhead.

  21. DBProxy + query engine is the in-house Postgres-compatible router. Initially Figma apps talked directly to PGBouncer. Horizontal sharding needed query parsing / planning / execution, so they inserted DBProxy (Go service, new component — systems/dbproxy-figma) between app and PGBouncer. Components: query parser (SQL → AST), logical planner (AST → query type + logical shard IDs), physical planner (logical shards → physical DBs, rewrites queries to execute on the appropriate physical shard), plus load-shedding + request hedging + transaction support + topology management + observability. Single-shard queries extract shard key and push down to one Postgres; cross-shard queries become scatter-gather. Scatter-gather is both expensive to implement in complex cases (aggregations / cross-colo joins / nested SQL) and a scale cap (each SG touches every shard → same load as unsharded), so the query-language subset is deliberately restricted.

  22. "Shadow application readiness" framework picks the right sharded-query subset. Users define candidate sharding schemes for tables, run the logical-planning phase against live production traffic without affecting it, and log queries + query plans to Snowflake. Offline analysis identifies which queries route cleanly to a single shard, which need scatter-gather, and which are worst-case. Result: Figma picked a query language that supports the 90% most common queries but avoids worst-case query-engine complexity — all range scans + point queries allowed; joins only when joining two tables in the same colo on the sharding key (patterns/shadow-application-readiness). Canonical "use live production traffic as the test oracle for a proposed API restriction" pattern, applied to SQL-subset selection.

  23. Topology library separates logical from physical topology. DBProxy needs to map table → shard key, logical shard id → (logical + physical db). Vertical partitioning had used a hard-coded config file; horizontal sharding required dynamic updates in under a second as shard splits complete. Because every topology change is backwards-compatible, updates are never on the critical path. Same substrate enables non-prod environments with identical logical topology but fewer physical DBs — cost + complexity reduction without divergence from prod behavior — and enforces invariants (every shard ID mapped to exactly one physical DB) critical for correctness during shard splits.

  24. Physical failover is 1→N (not 1→1) with partial-success resilience. The actual physical shard operation reuses much of the vertical-partitioning 1→1 failover logic but must handle new failure modes where the sharding op succeeds on a subset of destination DBs. Many of the highest-risk pieces had already been de-risked by vertical partitioning, accelerating the first physical shard op.

  25. Full (not filtered) logical replication during resharding. Instead of implementing "filtered logical replication" (where only each shard's subset of data is copied to the corresponding destination), Figma copies the entire dataset to each new shard, then restricts reads/writes to each shard's subset of the data. Trade-off is extra transient storage during the reshard in exchange for simpler replication mechanics and a known-good copy everywhere.

  26. Open constraints inherited from horizontal sharding — enumerated explicitly:

    • Certain SQL queries become inefficient / impossible to support.
    • App code must include shard-key info to route queries efficiently.
    • Schema changes must coordinate across all shards; foreign keys + globally unique indexes can no longer be enforced by Postgres — currently only unique indexes including the sharding key are supported.
    • Transactions can span multiple shards → partial commit failures are possible (the blog's illustrative example: "moving a team between two organizations, only to find half their data was missing"); product logic must be resilient to them.
    • Atomic cross-shard transactions intentionally not supported (work around cross-shard transaction failures at the product layer).
  27. Future scope named. Horizontally-sharded schema updates, globally-unique ID generation for sharded primary keys, atomic cross-shard transactions for business-critical use cases, distributed globally-unique indexes, an ORM seamlessly compatible with horizontal sharding, fully-automated one-click reshard operations. And the meta-move: once runway is bought, re-evaluate whether to stay on in-house RDS-based horizontal sharding or switch to a NewSQL / managed solution — an explicit acknowledgement that the choice was shaped by the 18-month timeline pressure, not by a long-term preference.

Architectural numbers

  • ~100× database stack growth since 2020.
  • 12+ vertically-partitioned databases by end of 2022 (concepts/vertical-partitioning steady-state pre-horizontal).
  • Tables approaching TB-scale / billions of rows as the vertical-partitioning ceiling.
  • 9 months to shard their first table end-to-end.
  • September 2023 — first horizontally-sharded table in production.
  • 10 seconds of partial availability on database primaries during the first physical failover.
  • 0 availability impact on replicas during the first physical failover.
  • < 10% worst-case performance overhead of Postgres views vs non-view queries (shadow-reads-validated).
  • 90% of queries covered by the deliberately-restricted sharded-query language (scatter-gather worst-case cases explicitly excluded from the subset).

Caveats / what's not in the post

  • No latency distributions (p50/p99), no throughput numbers, no cost figures, no actual shard count or per-shard capacity.
  • No detail on the hash function used for shard-key routing.
  • No detail on how scatter-gather queries aggregate / deduplicate results, no detail on request-hedging or load-shedding algorithms.
  • No failure-mode description for partial commit failures (what happens product-side; what recovery looks like).
  • No detail on the DBProxy topology-library update protocol (how clients observe updates in < 1s).
  • The "ten seconds of partial availability" is a raw number; no per-request-class breakdown.
  • Alternative NewSQL systems (CockroachDB / TiDB / Spanner / Vitess) evaluated but no performance / cost comparison numbers.
  • The three sharding keys (UserID, FileID, OrgID) are named but the full list of colos and their keys is not enumerated.
  • Future-scope items are listed but no target timeline for any of them.

Wiki relationships

Source

Last updated · 200 distilled / 1,178 read