Skip to content

PATTERN Cited by 1 source

Namespace-backed storage routing

Namespace-backed storage routing is the pattern of using a logical namespace identifier as the central configuration unit that maps a workload to its physical storage — one or more concrete backends, consistency settings, latency targets, cache tiers, and retention policies. Callers pick a namespace; the platform decides where and how the data lands.

Canonical instance: Netflix's KV Data Abstraction Layer, whose per-namespace persistence_configuration chooses backends, regions, consistency scope, and cache tiers.

The shape

namespace: ngsegment
persistence_configuration:
  - id: PRIMARY_STORAGE
    physical_storage:
      type: CASSANDRA
      cluster: cassandra_kv_ngsegment
      dataset: ngsegment
      table:   ngsegment
      regions: [us-east-1]
      config:
        consistency_scope:  LOCAL
        consistency_target: READ_YOUR_WRITES
  - id: CACHE
    physical_storage:
      type: CACHE
      cluster: evcache_kv_ngsegment
    config:
      default_cache_ttl: 180s

Two physical backends under one logical namespace:

  • PRIMARY_STORAGE — a Cassandra cluster named cassandra_kv_ngsegment in us-east-1, with local-scope READ_YOUR_WRITES (see concepts/tunable-consistency).
  • CACHE — an EVCache cluster with a 180-second default TTL.

Callers see "namespace ngsegment"; they do not encode Cassandra or EVCache in their code. (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)

What the namespace owns

  • Which engine(s) back the data — single engine or multi-tier (primary + cache + chunk store).
  • Region placement — which regions hold replicas.
  • Consistency knobs — scope (local / quorum / all) and target (read-your-writes / eventual / strong).
  • Latency / SLO targets — for signaling to clients and for SLO-aware early response.
  • TTL / retention defaults — both cache-tier TTLs and primary- tier retention.
  • Compression / chunking thresholds (if per-namespace tunable).

The two wins

  1. Developers describe their problem, not the solution. "This flexibility allows our Data Platform to route different use cases to the most suitable storage system based on performance, durability, and consistency needs. Developers just provide their data problem rather than a database solution!" The platform team retains ownership of which engine is appropriate — and can change that call later without touching caller code.
  2. Backend migrations become platform-side. A namespace whose primary is Cassandra today could be re-backed by a different engine tomorrow with only namespace-config + data-migration work — no caller change.

Relation to concepts/database-agnostic-abstraction

Database-agnostic abstraction is the property; namespace-backed storage routing is the mechanism through which the property is realised in practice. The namespace carries the configuration that makes the abstraction concrete at runtime.

Trade-offs

  • Namespace proliferation. Every distinct access pattern wants its own namespace; organizations end up with lots. Tooling for discoverability + ownership is necessary.
  • Schema per namespace. The DAL's uniform data model (e.g. two-level map) reduces this, but per-namespace item shape / index needs still matter.
  • Migration-in-place is still work. The namespace config changes are easy; re-platforming the underlying data is not. The namespace abstraction contains the blast radius, it doesn't eliminate the cost.
  • Multi-backend namespaces need consistency bridging. A primary + cache namespace has to reason about cache invalidation, fill semantics, and failure-mode reads-from-primary. The DAL can centralize this, but the complexity is real.
  • Policy changes via namespace config are cross-cutting — bumping one namespace's SLO target can ripple across many client teams using it; governance of who can edit a namespace matters.

Sibling patterns

Seen in

Last updated · 319 distilled / 1,201 read