SYSTEM Cited by 5 sources
AWS RDS¶
Amazon RDS (Relational Database Service) is AWS's managed-relational- database offering, covering MySQL, Postgres, MariaDB, SQL Server, and Oracle. RDS takes over backups, patching, failover, and automated minor-version upgrades; capacity growth is primarily via instance class resizing + storage autoscaling + read replicas.
Pattern of appearance¶
RDS comes up in scaling stories as the operationally-sane default OLTP store — and, eventually, the vertical-scale ceiling.
- Initial choice is usually fine for 1–2 years of product traffic.
- Doubling strategy: as storage burns, instance class goes up; so does cost and blast radius.
- At several TB, version upgrades have to be "close to zero downtime", which dominates DBRE on-call budget.
- Shared-instance risk: many features on one RDS means downtime in one kills all of them, forcing a database split before real sharding.
Canva's Creators-payment pipeline walked this arc: MySQL RDS, instance size doubling every 8–10 months, free storage falling ~500 GB (≈50%) in 6 months, DB split done to reduce shared-instance blast radius, then ultimately the move off RDS for the aggregation workload and onto Snowflake. RDS survived in the architecture as the serving tier for Snowflake-computed aggregates, with a rate-limited ingester — and showed up again as the tuning constraint (RDS CPU spikes when the warehouse unload was too fast). See patterns/warehouse-unload-bridge. (Source: sources/2024-04-29-canva-scaling-to-count-billions)
Key operational realities¶
- Vertical-first scaling. Storage autoscale helps the "free disk" metric; it doesn't help the CPU or I/O ceiling of a hot table.
- Zero-downtime upgrades at TB scale are specialist work.
- Read-replica lag matters once you're serving off replicas.
- Ingest throughput sensitivity. RDS CPU can spike on sustained write bursts; ingestion-from-warehouse workflows need explicit rate-limit tuning — Canva documented this directly.
Seen in¶
- sources/2024-04-29-canva-scaling-to-count-billions — MySQL RDS as v1 counting store hitting vertical-scale wall; later as serving RDS for Snowflake unload with CPU-spike tuning.
- sources/2025-05-03-aws-postgresql-transaction-visibility-read-replicas
— AWS's response to Jepsen's 2025-04-29 Multi-AZ-Postgres
analysis
confirming the reported transaction-visibility anomaly but
clarifying it is inherent to community Postgres (pgsql-hackers since
2013), not RDS-specific. RDS for Postgres Multi-AZ cluster
configurations inherit Postgres's
ProcArray-based visibility model, in which the order transactions become visible (removal fromProcArray) can diverge from the order they become durable (WAL commit-record write); this admits the Long Fork anomaly (two readers on primary + replica observing concurrent non- conflicting transactions in different orders — a violation of concepts/snapshot-isolation's atomic-visibility property). Single-AZ deployments are unaffected (no cross-node divergence path). The sibling AWS offerings that sidestep the anomaly are systems/aurora-limitless and systems/aurora-dsql, which replaceProcArray-based visibility with time-based MVCC via Postgres-extension surgery (see patterns/postgres-extension-over-fork). AWS's PostgreSQL Contributors Team (formed 2022) is co-developing the proposed upstream CSN fix. - sources/2026-02-05-aws-convera-verified-permissions-fine-grained-authorization
— RDS as the tenant-isolated data store at the bottom of
Convera's multi-tenant authorization chain: "Amazon RDS is
configured to accept only requests with specific tenant context
and returns data specific to the requested
tenant_id." This RDS-side enforcement is the last line of defense under zero-trust re-verification — even if the authorizer + backend checks both fail, the database refuses cross-tenant reads. RDS also serves as the user roles store that Convera's pre-token hook queries at login time to enrich the Cognito JWT withroleclaims. - sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — Figma's 2020–2022 scaling story on RDS Postgres: ~100× database stack growth, 2020 on AWS's largest instance → end-of-2022 with a dozen vertically-partitioned RDS Postgres instances. Named three RDS-instance ceilings that triggered the subsequent horizontal sharding effort: (1) vacuum reliability impact at TB-scale tables, (2) maximum IOPS supported by Amazon RDS on high-write tables growing fast enough to soon exceed the per-instance cap, (3) CPU on the hottest partitions. Canonical instance of the "RDS-as-vertical-scale-ceiling" story — with the wrinkle that Figma chose to keep RDS Postgres as the substrate even through horizontal sharding, building systems/dbproxy-figma on top rather than migrating to NewSQL (CockroachDB / TiDB / Spanner / Vitess) or NoSQL. First horizontally-sharded table shipped September 2023 with 10s partial primary availability.
- sources/2026-03-31-aws-streamlining-access-to-dr-capabilities — RDS as the canonical data-tier DR example. Figure 1 shows the multi-destination fanout for RDS: automated backups + manual snapshots + cross-Region snapshot copy + cross-account snapshot copy + AWS Backup vault copies
- read replicas — a single RDS instance can be protected along any of these axes independently. The post also names RDS as the canonical DR config-translation case: restored RDS has a new endpoint, and applications must be rebound to it (the post's named mechanism: Route 53 private hosted zone CNAME mapping old-endpoint → new-endpoint in the recovered VPC).
Related¶
- systems/mysql, systems/postgresql
- systems/aurora-dsql, systems/aurora-limitless — sibling Aurora offerings that sidestep Postgres's Multi-AZ Long Fork anomaly via time-based MVCC.
- concepts/oltp-vs-olap
- concepts/long-fork-anomaly, concepts/snapshot-isolation, concepts/visibility-order-vs-commit-order, concepts/commit-sequence-number — the RDS-Postgres Multi-AZ consistency-model cluster.
- patterns/warehouse-unload-bridge