Skip to content

SYSTEM Cited by 1 source

GitHub Enterprise Server

GitHub Enterprise Server (GHES) is GitHub's self-hosted distribution: the customer runs the appliance in their own datacenter / cloud, operates the HA pair themselves, and upgrades on their own cadence. Distinct from GitHub Enterprise Cloud (GHEC) (SaaS, GitHub-operated) — the managed github.com page covers GHEC.

Deployment topology

A typical HA GHES install is two appliance nodes:

  • Primary node — receives all writes, all user traffic.
  • Replica node — stays in sync, takes over on failover. Designed as read-only in steady state.

This leader/follower invariant runs deep: every GHES subsystem is aware of it, and anything that writes to shared state must live on the primary side.

Search substrate: 2026 CCR-based rewrite

Pre-rewrite topology (problem)

Historically, GHES ran one Elasticsearch cluster spanning the primary and replica GHES nodes. This was a forced design choice: classic Elasticsearch didn't support a leader/follower primary-replica-cluster pattern, so the only way to get search data from primary to replica was to let ES itself form a cross-node cluster with nodes on both GHES hosts.

This misaligned the storage topology with the application topology. Two failure modes followed:

  • Index maintenance footguns: following the wrong upgrade / maintenance sequence could leave search indexes damaged and requiring repair, or locked during upgrades.
  • Mutual-blockage deadlock: ES was free to rebalance a primary shard onto the replica GHES node. If the replica was then taken down for maintenance, the replica waited for ES-cluster health before starting up, while ES couldn't become healthy until the replica rejoined.

Multi-release mitigations (health-check gates, drift-correction processes, an abandoned in-house "search mirroring" DB-replication effort) did not fix the root cause — they could not make ES behave like a leader/follower system when the cluster spanned both appliance nodes.

Post-rewrite topology (solution)

GHES 3.19.1 (opt-in) ships a rewrite: collapse to one single-node Elasticsearch cluster per GHES node, link them with Elasticsearch Cross Cluster Replication (CCR). Primary's ES cluster is the CCR leader, replica's ES cluster is the CCR follower; CCR replicates at the Lucene segment level — i.e. data that has already been durably persisted at the leader. See concepts/cross-cluster-replication and patterns/single-node-cluster-per-app-replica.

The rewrite is a canonical in-wiki instance of concepts/primary-replica-topology-alignment — the storage layer's replication direction now matches the application layer's write-ownership direction, and the failure mode is impossible by construction (ES can't move a primary shard to the follower cluster — there's no cross-cluster rebalancing in CCR).

Lifecycle workflows GitHub owns on top of CCR

Elasticsearch only handles document replication over CCR. Everything else is the customer's responsibility:

  • Bootstrap: CCR's auto-follow API only covers indexes created after the policy exists. GHES has a long-lived set of pre-existing indexes, so the rewrite adds an imperative bootstrap step that enumerates current indexes, attaches follower contracts, and then installs the auto-follow policy for future indexes. See patterns/bootstrap-then-auto-follow.
  • Failover workflow — moving CCR leader role from failed primary to promoted replica.
  • Index deletion workflow — coordinating deletion across leader
  • follower so CCR doesn't recreate the index after the leader deletes it.
  • Upgrade workflow — ordering ES upgrades + CCR version compatibility for rolling / non-rolling paths.

Elasticsearch handles only the document-replication leg. Everything else — the full index lifecycle — is GitHub-authored code around CCR.

Enabling CCR mode

  1. Customer contacts GitHub Support, who provisions the required license.
  2. Run ghe-config app.elasticsearch.ccr true.
  3. Run config-apply or upgrade the HA cluster to 3.19.1.
  4. On restart, ES consolidates all data onto the primary, breaks cross-node clustering, and restarts replication via CCR.

The migration duration scales with instance size (no numbers disclosed). Default-on rollout is planned over the next two years. (Source: sources/2026-03-03-github-how-we-rebuilt-the-search-architecture-for-high-availability)

Relationship to github.com / GHEC

GHES ships the same application as github.com but in an appliance-deployable, HA-pair topology. github.com operates a much larger search infrastructure (not a two-node HA pair) — the rewrite described here is a GHES-specific topology choice; it does not imply any change to the github.com / GHEC search stack.

Stub caveats

  • This page is stubbed around the 2026-03 CCR-search rewrite. Other GHES surfaces (HA for MySQL / Redis / Git storage, backup tools, ephemeral-node deployments) are not yet documented here.
  • GHES's version-support / deprecation policy, customer-operable knobs, and appliance-image packaging are out of scope for this stub.
  • Licensing specifics of the CCR-enabled ES distribution shipped with GHES 3.19.1 are not documented in the source.

Seen in

Last updated · 200 distilled / 1,178 read