PATTERN Cited by 1 source
Cross-Cloud Replica Cache¶
Pattern: the canonical copy of a dataset lives in one cloud; a scheduled incremental-replication job materialises a local replica in each consumer cloud's object store; consumers read the local replica instead of live-querying across the cloud boundary.
The producer still exposes exactly one logical share; the replica is a consumer-side optimisation of that share, not a second data product. Most consumers hit the replica; latency-sensitive consumers can still read the live share directly.
Shape¶
Canonical store (cloud A)
│
│ open sharing protocol (Delta Sharing, or equivalent)
│
├──► Latency-sensitive consumers: direct live read (fresh, costs per read)
│
└──► Incremental Sync Job (Delta Deep Clone, or equivalent)
│
▼
Local replica (cloud B, object store)
│
└──► Latency-tolerant consumers: local read, bounded staleness, cheap
Preconditions¶
- Incremental replication primitive. Full re-replication each window collapses the economics; the protocol / storage layer must support "just the delta since last sync" (Delta Deep Clone does; snapshot-based OTFs generally do).
- Staleness tolerance. A meaningful consumer cohort must be OK with N-hour-to-N-day data; if every consumer needs live-freshness, the replica tier has no readers.
- Observable bytes transferred. You need to be able to attribute egress-bytes to Sync Jobs so that the chargeback / cost-tracking side of the pattern works (see patterns/chargeback-cost-attribution).
- A way to propagate deletes. Right-to-be-forgotten / GDPR /
retention-policy deletes on the canonical side must show up on
every replica within some SLO. Mercedes-Benz uses Delta Lake
VACUUMon the replica tables for exactly this.
Knobs¶
- Sync cadence. The dial between freshness and egress cost; one cadence per data-product × consumer-cohort is normal. Mercedes-Benz moved from weekly to every-second-day without changing the protocol.
- Storage locality. The replica lives in the consumer cloud's native object store (ADLS, S3, GCS), so local engines read at inside-cloud throughput / cost.
- Schema / format translation. If the canonical format and the consumer-expected format differ, the sharing boundary (not the replication boundary) is the natural place to translate — Mercedes-Benz had Iceberg on the producer and Delta on the consumer, translated at Unity Catalog's federation boundary.
Why it's not just "caching"¶
- The replica is authoritative for its readers. Inside the bounded staleness window it is the data, not a cache miss away from reality. Treating it as a cache (and thinking about invalidation) gets the semantics wrong.
- Compaction / schema evolution / GC are first-class. Unlike a read-through cache, the replica is a real table that grows; it needs the same lifecycle operations as the canonical.
- It has a producer-side cost. The Sync Job is the producer's compute; the egress is the producer's chargeback. A cache is usually a consumer-side asset; a replica is architecturally two-sided.
Relation to other patterns¶
- patterns/async-projected-read-model (Canva's print-routing case) is the within-one-cloud cousin: rebuild a read-optimised projection from the source of truth asynchronously. This pattern is the cross-cloud/egress-motivated specialisation — the projection is the canonical data, just materialised in another cloud.
- patterns/chargeback-cost-attribution pairs well here: the pattern only stays healthy when someone is held accountable for the replication bytes.
Seen in¶
- sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — Mercedes-Benz's headline use of the pattern: AWS-side Iceberg canonical → Azure-side Delta replicas via Delta Sharing + Delta Deep Clone, reported 66 % egress cost reduction.