PATTERN Cited by 1 source

Cross-Cloud Replica Cache¶

Pattern: the canonical copy of a dataset lives in one cloud; a scheduled incremental-replication job materialises a local replica in each consumer cloud's object store; consumers read the local replica instead of live-querying across the cloud boundary.

The producer still exposes exactly one logical share; the replica is a consumer-side optimisation of that share, not a second data product. Most consumers hit the replica; latency-sensitive consumers can still read the live share directly.

Shape¶

Canonical store (cloud A)
    │
    │  open sharing protocol (Delta Sharing, or equivalent)
    │
    ├──► Latency-sensitive consumers: direct live read (fresh, costs per read)
    │
    └──► Incremental Sync Job (Delta Deep Clone, or equivalent)
             │
             ▼
        Local replica (cloud B, object store)
             │
             └──► Latency-tolerant consumers: local read, bounded staleness, cheap

Preconditions¶

Incremental replication primitive. Full re-replication each window collapses the economics; the protocol / storage layer must support "just the delta since last sync" (Delta Deep Clone does; snapshot-based OTFs generally do).
Staleness tolerance. A meaningful consumer cohort must be OK with N-hour-to-N-day data; if every consumer needs live-freshness, the replica tier has no readers.
Observable bytes transferred. You need to be able to attribute egress-bytes to Sync Jobs so that the chargeback / cost-tracking side of the pattern works (see patterns/chargeback-cost-attribution).
A way to propagate deletes. Right-to-be-forgotten / GDPR / retention-policy deletes on the canonical side must show up on every replica within some SLO. Mercedes-Benz uses Delta Lake VACUUM on the replica tables for exactly this.

Knobs¶

Sync cadence. The dial between freshness and egress cost; one cadence per data-product × consumer-cohort is normal. Mercedes-Benz moved from weekly to every-second-day without changing the protocol.
Storage locality. The replica lives in the consumer cloud's native object store (ADLS, S3, GCS), so local engines read at inside-cloud throughput / cost.
Schema / format translation. If the canonical format and the consumer-expected format differ, the sharing boundary (not the replication boundary) is the natural place to translate — Mercedes-Benz had Iceberg on the producer and Delta on the consumer, translated at Unity Catalog's federation boundary.

Why it's not just "caching"¶

The replica is authoritative for its readers. Inside the bounded staleness window it is the data, not a cache miss away from reality. Treating it as a cache (and thinking about invalidation) gets the semantics wrong.
Compaction / schema evolution / GC are first-class. Unlike a read-through cache, the replica is a real table that grows; it needs the same lifecycle operations as the canonical.
It has a producer-side cost. The Sync Job is the producer's compute; the egress is the producer's chargeback. A cache is usually a consumer-side asset; a replica is architecturally two-sided.

Relation to other patterns¶

patterns/async-projected-read-model (Canva's print-routing case) is the within-one-cloud cousin: rebuild a read-optimised projection from the source of truth asynchronously. This pattern is the cross-cloud/egress-motivated specialisation — the projection is the canonical data, just materialised in another cloud.
patterns/chargeback-cost-attribution pairs well here: the pattern only stays healthy when someone is held accountable for the replication bytes.

Seen in¶

sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — Mercedes-Benz's headline use of the pattern: AWS-side Iceberg canonical → Azure-side Delta replicas via Delta Sharing + Delta Deep Clone, reported 66 % egress cost reduction.