PATTERN Cited by 1 source
Polymorphic usage tables for multi-tenant scale¶
Pattern¶
When a multi-tenant SaaS needs to store per-tenant
metadata or usage data for many entity types (fields,
options, roles, schemes, work types, …), use a small fixed
set of generic "polymorphic" tables keyed by something
like (tenant_id, entity_type, entity_id, …) rather than
one dedicated table per entity type. The decision avoids
the combinatorial explosion of database objects that
per-entity-type-table designs cause when the tenant count is
large.
The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern. Verbatim:
"Naively adding one dedicated table per entity doesn't scale in Jira's multi-tenant architecture: a single new table becomes millions of tables across production, plus indexes and internal database objects. So we used two layers: ... For persistent pre-computation, we store data in the Jira relational database using a small set of generic tables (polymorphic 'usage' tables) instead of one table per entity type."
(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)
Problem¶
A naïve schema decision "one table per entity type" multi- plies into a problem at multi-tenant scale:
tables_added = entity_types × tenants (per-tenant DB design)
total_db_objects ≈ tables_added × (1 + N indexes + M internals)
For Atlassian-scale Jira, each new dedicated table per entity type "becomes millions of tables across production, plus indexes and internal database objects."
The cost shape:
- Postgres metadata RAM — each backend caches catalog entries for tables it touches; see concepts/catalog-bloat-multi-tenant for the Postgres-altitude cousin failure.
- Operational overhead — schema migrations, monitoring, backup verification all per-table.
- Internal DB-object pressure — indexes, constraints, triggers, sequences multiply alongside the tables.
A second naïve solution — use a wide JSON column on a single table — sacrifices index efficiency and schema validation; this pattern occupies the principled middle ground.
Solution¶
Use a small set of generic tables with shape like:
generic_usage_table
tenant_id bigint
entity_type enum-or-string -- 'field' | 'option' | 'role' | ...
entity_id bigint
scope_id bigint -- e.g. space, scheme, ...
usage_count bigint
last_used_at timestamp
computed_at timestamp
...
PRIMARY KEY (tenant_id, entity_type, entity_id, scope_id)
(schematic, not Atlassian-disclosed)
Several discrete tables can co-exist where the column shapes diverge enough to warrant separation, but the count is bounded by what columns matter rather than how many entity types exist.
What "small set" means¶
The 2026-05-14 post says "a small set of generic tables" — not a single mega-table. The number is bounded by the shape of the queries the data must support:
- Usage counts — one table.
- Last-used timestamps — possibly the same table.
- Per-entity associations (which scheme contains which field) — possibly a different shape, separate table.
The discipline: avoid one-table-per-entity-type, not "merge everything into one table no matter what."
When to use¶
Use polymorphic usage tables when:
- Tenant count × entity-type count is large enough that per-entity-type tables would exceed sensible operational bounds (Atlassian-scale: millions of tables).
- Query patterns are predictable — usage queries
are generally "how many
(entity_type, entity_id)rows for this(tenant_id, scope_id)", not arbitrary schema-walking. - Per-entity-type column variance is bounded — the data per entity type fits the shared column shape without requiring per-row JSON columns or side-table joins that defeat the point.
- The dataset is read-heavy and refreshed in batch — the typical SaaS-precomputation profile.
When not to use¶
Avoid polymorphic usage tables when:
- Tenant count is bounded (single-digit or low hundreds) — per-entity-type tables work fine, schema stays clean.
- Per-entity-type queries diverge significantly — if you need entity-specific indexes, EAV or per-table evolution is preferred.
- Per-entity-type schema must evolve independently — shared columns force lockstep schema changes; if entity schemas need to diverge over time, separation is worth the cost.
- The data is OLTP write-heavy — polymorphic tables can
experience hotspots on the
(tenant_id, entity_type)prefix; OLTP write hotspots are typically better served by per-entity-type tables that can scale-out independently.
Trade-offs¶
| Property | Polymorphic usage tables | Per-entity-type tables |
|---|---|---|
| Total table count | Bounded (small, fixed) | Linear in entity-types × tenants |
| Catalog / metadata RAM | Low | High at scale |
| Operational overhead | Low (few tables to migrate / monitor) | High at scale |
| Per-entity-type indexes | Composite index on (tenant, entity_type, ...) |
Per-table tailored |
| Schema evolution per entity | Locked to shared shape | Independent |
| Query complexity | Always filter on entity_type |
Direct table reference |
| Hot-spot risk | Higher (concentrated in fewer tables) | Lower (spread across many tables) |
Implementation discipline¶
- Pre-determine the column shape so per-entity-type variance fits without expansion. JSON columns for "extra attributes" are an escape hatch; if used, accept they reduce indexing power.
- Index the composite key carefully. The leading
column should support the most common query predicate;
for usage reports that's typically
tenant_id(or a combined(tenant_id, scope_id)). - Consider per-tenant partitioning if the table grows
large; partition on
tenant_idrather thanentity_type. - Monitor row-count growth — at "~1M records per entity type, per tenant" with multiple entity types and many tenants, the total row count can become very large; retention and archival policies become first-class concerns.
Generalises beyond usage data¶
The same pattern applies whenever a multi-tenant SaaS needs to store per-tenant metadata for many heterogeneous entity types:
- Audit logs — a single
audit_logtable with(tenant_id, action, entity_type, entity_id, …)rather than per-entity-type audit tables. - Permission / ACL stores — a single permission table
with polymorphic
(subject_type, subject_id, resource_type, resource_id, action)rather than per-resource permission tables. - Notification subscriptions — a single subscription
table with
(user_id, source_type, source_id, channel)rather than per-source-type subscription tables. - Generic key-value tags on heterogeneous entities.
The discriminator: when the unifying key shape is consistent across the heterogeneous entity types, polymorphic tables collapse table-count cost dramatically without giving up indexability.
Seen in¶
- sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home. The Pre-computation Framework uses polymorphic usage tables for persistent pre-computed usage data, paired with Memcached for short-lived intra-job state — see patterns/tiered-state-management-memcache-plus-db. The decision is justified by the Postgres multi-tenant catalog explosion that per-entity-type tables would cause: "a single new table becomes millions of tables across production, plus indexes and internal database objects."
Related¶
- systems/postgresql
- systems/atlassian-precomputation-framework
- systems/atlassian-jira-optimisation-tools
- concepts/polymorphic-usage-tables-multi-tenant — the underlying concept.
- concepts/catalog-bloat-multi-tenant — the Postgres-altitude failure mode this pattern avoids.
- concepts/tenant-isolation
- patterns/tiered-state-management-memcache-plus-db — the storage architecture the polymorphic tables are part of.