PATTERN Cited by 1 source

Polymorphic usage tables for multi-tenant scale¶

Pattern¶

When a multi-tenant SaaS needs to store per-tenant metadata or usage data for many entity types (fields, options, roles, schemes, work types, …), use a small fixed set of generic "polymorphic" tables keyed by something like (tenant_id, entity_type, entity_id, …) rather than one dedicated table per entity type. The decision avoids the combinatorial explosion of database objects that per-entity-type-table designs cause when the tenant count is large.

The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern. Verbatim:

"Naively adding one dedicated table per entity doesn't scale in Jira's multi-tenant architecture: a single new table becomes millions of tables across production, plus indexes and internal database objects. So we used two layers: ... For persistent pre-computation, we store data in the Jira relational database using a small set of generic tables (polymorphic 'usage' tables) instead of one table per entity type."

(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)

Problem¶

A naïve schema decision "one table per entity type" multi- plies into a problem at multi-tenant scale:

tables_added = entity_types × tenants  (per-tenant DB design)
total_db_objects ≈ tables_added × (1 + N indexes + M internals)

For Atlassian-scale Jira, each new dedicated table per entity type "becomes millions of tables across production, plus indexes and internal database objects."

The cost shape:

Postgres metadata RAM — each backend caches catalog entries for tables it touches; see concepts/catalog-bloat-multi-tenant for the Postgres-altitude cousin failure.
Operational overhead — schema migrations, monitoring, backup verification all per-table.
Internal DB-object pressure — indexes, constraints, triggers, sequences multiply alongside the tables.

A second naïve solution — use a wide JSON column on a single table — sacrifices index efficiency and schema validation; this pattern occupies the principled middle ground.

Solution¶

Use a small set of generic tables with shape like:

generic_usage_table
  tenant_id        bigint
  entity_type      enum-or-string  -- 'field' | 'option' | 'role' | ...
  entity_id        bigint
  scope_id         bigint          -- e.g. space, scheme, ...
  usage_count      bigint
  last_used_at     timestamp
  computed_at      timestamp
  ...
PRIMARY KEY (tenant_id, entity_type, entity_id, scope_id)

(schematic, not Atlassian-disclosed)

Several discrete tables can co-exist where the column shapes diverge enough to warrant separation, but the count is bounded by what columns matter rather than how many entity types exist.

What "small set" means¶

The 2026-05-14 post says "a small set of generic tables" — not a single mega-table. The number is bounded by the shape of the queries the data must support:

Usage counts — one table.
Last-used timestamps — possibly the same table.
Per-entity associations (which scheme contains which field) — possibly a different shape, separate table.

The discipline: avoid one-table-per-entity-type, not "merge everything into one table no matter what."

When to use¶

Use polymorphic usage tables when:

Tenant count × entity-type count is large enough that per-entity-type tables would exceed sensible operational bounds (Atlassian-scale: millions of tables).
Query patterns are predictable — usage queries are generally "how many (entity_type, entity_id) rows for this (tenant_id, scope_id)", not arbitrary schema-walking.
Per-entity-type column variance is bounded — the data per entity type fits the shared column shape without requiring per-row JSON columns or side-table joins that defeat the point.
The dataset is read-heavy and refreshed in batch — the typical SaaS-precomputation profile.

When not to use¶

Avoid polymorphic usage tables when:

Tenant count is bounded (single-digit or low hundreds) — per-entity-type tables work fine, schema stays clean.
Per-entity-type queries diverge significantly — if you need entity-specific indexes, EAV or per-table evolution is preferred.
Per-entity-type schema must evolve independently — shared columns force lockstep schema changes; if entity schemas need to diverge over time, separation is worth the cost.
The data is OLTP write-heavy — polymorphic tables can experience hotspots on the (tenant_id, entity_type) prefix; OLTP write hotspots are typically better served by per-entity-type tables that can scale-out independently.

Trade-offs¶

Property	Polymorphic usage tables	Per-entity-type tables
Total table count	Bounded (small, fixed)	Linear in entity-types × tenants
Catalog / metadata RAM	Low	High at scale
Operational overhead	Low (few tables to migrate / monitor)	High at scale
Per-entity-type indexes	Composite index on `(tenant, entity_type, ...)`	Per-table tailored
Schema evolution per entity	Locked to shared shape	Independent
Query complexity	Always filter on `entity_type`	Direct table reference
Hot-spot risk	Higher (concentrated in fewer tables)	Lower (spread across many tables)

Implementation discipline¶

Pre-determine the column shape so per-entity-type variance fits without expansion. JSON columns for "extra attributes" are an escape hatch; if used, accept they reduce indexing power.
Index the composite key carefully. The leading column should support the most common query predicate; for usage reports that's typically tenant_id (or a combined (tenant_id, scope_id)).
Consider per-tenant partitioning if the table grows large; partition on tenant_id rather than entity_type.
Monitor row-count growth — at "~1M records per entity type, per tenant" with multiple entity types and many tenants, the total row count can become very large; retention and archival policies become first-class concerns.

Generalises beyond usage data¶

The same pattern applies whenever a multi-tenant SaaS needs to store per-tenant metadata for many heterogeneous entity types:

Audit logs — a single audit_log table with (tenant_id, action, entity_type, entity_id, …) rather than per-entity-type audit tables.
Permission / ACL stores — a single permission table with polymorphic (subject_type, subject_id, resource_type, resource_id, action) rather than per-resource permission tables.
Notification subscriptions — a single subscription table with (user_id, source_type, source_id, channel) rather than per-source-type subscription tables.
Generic key-value tags on heterogeneous entities.

The discriminator: when the unifying key shape is consistent across the heterogeneous entity types, polymorphic tables collapse table-count cost dramatically without giving up indexability.

Seen in¶

sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home. The Pre-computation Framework uses polymorphic usage tables for persistent pre-computed usage data, paired with Memcached for short-lived intra-job state — see patterns/tiered-state-management-memcache-plus-db. The decision is justified by the Postgres multi-tenant catalog explosion that per-entity-type tables would cause: "a single new table becomes millions of tables across production, plus indexes and internal database objects."

systems/postgresql
systems/atlassian-precomputation-framework
systems/atlassian-jira-optimisation-tools
concepts/polymorphic-usage-tables-multi-tenant — the underlying concept.
concepts/catalog-bloat-multi-tenant — the Postgres-altitude failure mode this pattern avoids.
concepts/tenant-isolation
patterns/tiered-state-management-memcache-plus-db — the storage architecture the polymorphic tables are part of.