CONCEPT Cited by 1 source
Isolation as fault-tolerance principle¶
Definition¶
Isolation as a reliability principle means systems are made from parts that are as physically and logically independent as possible, such that failures in one part do not cascade into failures in an independent part, and parts in the critical path have as few dependencies as possible.
Max Englander's canonical framing (sources/2026-04-21-planetscale-the-principles-of-extreme-fault-tolerance):
"Systems are made from parts that are as physically and logically independent as possible. Failures in one part do not cascade into failures in an independent part. Parts in the critical path have as few dependencies as possible."
And on redundancy, which Englander names as the principle that co-applies with isolation:
"Each part is copied multiple times, so if one part fails, its copies continue doing its work. Copies of each part are themselves isolated from each other."
The bolded clause is load-bearing: redundancy without isolation between copies is not fault tolerance — if the copies share a failure domain (same rack, same AZ, same power distribution, same library dependency, same deployment pipeline), they fail together.
Three axes of isolation¶
Physical isolation¶
Copies placed across distinct physical failure domains — different machines, racks, AZs, regions, power feeds, network links. Canonical application: primary + ≥2 replicas across 3 AZs. Extends to different cloud providers for the strongest isolation (PlanetScale runs on both AWS and GCP).
Logical isolation¶
Copies don't share a software-level failure domain — no shared memory, no shared process, no shared library version that could cascade a bug across copies. The counter-examples are load-bearing:
- Two replicas on the same MySQL binary version share the "MySQL bug in that version" failure domain. Mitigated by progressive-delivery per database (concepts/progressive-delivery-per-database).
- Two processes on the same VM share the "VM crashes" failure domain. Mitigated by physical isolation across VMs.
- Two systems depending on the same configuration service share the "config-service outage" failure domain. Mitigated by data-plane caches last-known-good control-plane state.
Dependency minimisation in the critical path¶
The critical path has as few dependencies as possible. Each dependency is a potential failure mode — adding a dependency to the critical path subtracts availability from it. PlanetScale's framing of the data plane:
"The most critical plane, with fewer dependencies than the control plane. Does not depend on the control plane."
This inverts the naive "more-critical = more-redundant" intuition with "more-critical = fewer-dependencies". Redundancy only helps for failure modes where the redundant copies aren't also taking the same dependency; shedding a dependency is strictly more reliable than redundifying it.
Composition with redundancy¶
Isolation + redundancy together produce fault tolerance:
- Redundancy alone:
Ncopies of a part, all sharing a single failure domain.Ncopies fail together; no benefit. - Isolation alone: one part with few dependencies. If the part itself fails (without redundant copies), the system fails.
- Both:
Ncopies acrossNdistinct failure domains, each with minimal dependencies. A failure in one copy's failure domain — or in one of its dependencies — does not affect the other copies.
Englander's formalism: "Each part is copied multiple times, so if one part fails, its copies continue doing its work. Copies of each part are themselves isolated from each other."
Applied to PlanetScale¶
Englander's essay canonicalises the principles-to-architecture mapping verbatim:
- Control plane / data plane split — the data plane (query serving) has "extremely few dependencies" so its criticality is matched by dependency minimisation. The control plane (billing, DB creation, metadata) is allowed more dependencies (including a PlanetScale database for its own metadata — a deliberate circular dependency safe only because the data plane survives control-plane failure). Canonicalised as concepts/control-plane-data-plane-separation.
- Regional + zonal redundancy of both planes — not just the data plane. Both planes are multi-AZ.
- Database clusters = primary + ≥2 replicas across 3 AZs — the concrete embodiment of multi-AZ Vitess cluster.
- Static stability (concepts/static-stability) as the behaviour isolation + redundancy enables: "survive the isolation-broken-part by continuing on the other copies with their last-known-good state".
Seen in¶
- PlanetScale, Max Englander, The principles of extreme fault tolerance, 2025-07-03 — canonical verbatim framing as the first of three principles (alongside static stability and redundancy).
Related¶
- concepts/static-stability — principle #3 of the trio
- concepts/blast-radius — isolation's operational measure
- concepts/control-plane-data-plane-separation — architectural shape exploiting isolation
- concepts/slow-is-failure — why a slow dependency counts as a failure domain too
- patterns/multi-az-vitess-cluster — physical isolation applied at cluster altitude
- patterns/shared-nothing-storage-topology — physical isolation applied at storage altitude