CONCEPT Cited by 1 source
Service coupling¶
Service coupling is the degree to which services depend on each others' implementation, availability, and behavior. Tight coupling produces short, legible call chains with strong global invariants at the cost of large blast radii and brittle evolution; loose coupling produces resilient evolution at the cost of needing explicit contract governance (concepts/schema-registry) and observability spanning multiple systems.
Axes of coupling¶
- Temporal: synchronous call-and-wait (tightest) vs asynchronous publish-and-forget (loosest).
- Spatial: hardcoded endpoint references vs discovery-based vs content-based routing (through an event bus).
- Informational: shared in-memory object / shared database table (tightest) vs shared data format with a versioned contract (loosest).
- Control: caller decides what consumer does (tight) vs consumer decides how to react to events (loose).
- Deployment: coupled release cycles / linked versioning (tight) vs independent deploy cadence (loose).
Cascade failure — the canonical tight-coupling failure mode¶
Synchronous, tightly-coupled service graphs compound downstream degradation upstream:
- Downstream service S_n gets slow (or degraded by a noisy neighbor).
- Upstream callers see latency rise, apply timeouts, then retry.
- Retries amplify load on the already-degraded S_n.
- S_n's latency rises further; retries fail; more retries.
- The calling service's request-handler pool fills with in-flight calls to S_n, blocking unrelated requests.
- Callers of the calling service now see timeouts, begin retrying...
- Deadlock fleet-wide.
This is what Amazon Key's pre-migration architecture exhibited: "an issue in Service-A triggered a cascade of failures across many upstream services, with increased timeouts leading to retry attempts and ultimately resulting in service deadlocks"; a single-device-vendor issue scoped to one delivery operation caused fleet-wide degradation. (Source: sources/2026-02-04-aws-amazon-key-eventbridge-event-driven-architecture)
Why "just adding SNS/SQS pairs" isn't enough¶
Amazon Key had already tried scattering SNS/SQS pairs between services to introduce async-decoupling points, but implemented on an ad-hoc basis — no shared abstraction, no standardisation, redundant maintenance. The decoupling failed not because pub/sub is wrong but because each pair was a bespoke integration without the governance surface (concepts/schema-registry) + subscriber scaffolding (patterns/reusable-subscriber-constructs) + rule-based routing that make EDA work at scale. Loose coupling is not just "less synchronous"; it's "less synchronous plus shared contract + shared substrate + per-consumer isolation".
Related¶
- concepts/event-driven-architecture — the architectural answer to the tight-coupling failure mode; inverts the call direction (producer → bus; consumer ← bus) so failures don't cascade.
- concepts/noisy-neighbor — the origin of the degradation that cascades under tight coupling; making one tenant's bad behavior another tenant's tail-latency problem is the forcing function for loose coupling at the data-plane level.
- concepts/tail-latency-at-scale — why the probability of at least one slow downstream approaches 1 under fanout, forcing retries to become a first-class source of correlated failure instead of a recovery mechanism.