Skip to content

CONCEPT Cited by 1 source

Cross Cluster Replication (CCR)

Cross Cluster Replication (CCR) is Elasticsearch's primitive for replicating index data between otherwise-independent Elasticsearch clusters. One cluster is the leader (read/write); one or more clusters are followers (read-only, continuously pulling from the leader). Replication is one-way at the Lucene segment granularity — the follower receives data that's already been durably persisted to disk on the leader.

The shape is distinct from ES's intra-cluster replication (where primary and replica shards of the same index live in one cluster): CCR is inter-cluster replication, a leader/follower pattern at the cluster level.

Wire model: leader / follower / auto-follow

  • Leader index — the source of writes; a normal index on the leader cluster.
  • Follower index — an index on a follower cluster configured to follow the leader. Reads allowed; writes go to the leader.
  • Auto-follow policy — pattern-matching rule ("follow all indexes whose names match app-* on cluster primary") that installs follower indexes automatically as new leader indexes are created.

Auto-follow's new-only gap

The auto-follow policy only matches indexes created after the policy is installed. Pre-existing leader indexes are not retroactively followed. Any system applying CCR to a long-lived deployment therefore needs a bootstrap step that enumerates pre-existing indexes and explicitly attaches followers before relying on the auto-follow policy — see patterns/bootstrap-then-auto-follow. This is a common failure mode for any policy-based replication/attachment primitive that is new-only.

What CCR handles vs what it doesn't

CCR replicates documents. It does not handle:

  • Failover orchestration — promoting a follower to leader after leader loss, re-pointing clients, coordinating config changes.
  • Index deletion coordination — a leader-side delete must be echoed on the follower (or the follower will recreate the deleted index from the leader's history).
  • Upgrade ordering — ES-version compatibility constraints between leader and follower during rolling upgrades.
  • Multi-leader — CCR is one-way leader→follower; bidirectional replication requires per-index-role separation and external arbitration.

These lifecycle responsibilities fall to whoever owns the application layer. GitHub's GHES 3.19.1 rewrite is an explicit instance of this: "Elasticsearch only handles the document replication, and we're responsible for the rest of the index's lifecycle." (Source: sources/2026-03-03-github-how-we-rebuilt-the-search-architecture-for-high-availability)

Why the segment-level grain matters

Because CCR replicates at the Lucene-segment level (immutable, durable), the follower cluster is never ahead of the leader's durable state, and the replication stream is replay-safe. This is the same "replicate-durable-state, not in-flight-operations" property that makes streaming logical replication safe in Postgres, block-level async replication safe in EBS snapshots, and WAL-based physical replication safe in distributed SQL systems.

CCR is structurally an instance of CDC (stream of durable storage changes) — at the Lucene-segment grain rather than the row grain.

Sibling replication primitives

  • CDC — row-level analog.
  • patterns/block-level-continuous-replication — analog at the block-storage layer (AWS Elastic DR, Arpio).
  • In-cluster primary-replica-shard replication — ES's other replication mechanism, intra-cluster, finer-grained, but doesn't give you cluster-level leader/follower semantics.

Canonical production shape (on the wiki)

The GHES HA-search rewrite shipped in GHES 3.19.1 is the wiki's canonical CCR production instance: collapse a multi-node ES cluster into N independent single-node clusters aligned to the app-layer primary/replica topology, then join them with CCR. See patterns/single-node-cluster-per-app-replica.

Seen in

Last updated · 200 distilled / 1,178 read