PATTERN Cited by 1 source

Peer debugging (scaling the senior engineer)¶

Intent¶

When a single senior engineer becomes a bottleneck on every hard problem in an org, move to a format where their systems knowledge is pooled with others — group debugging sessions with shared terminals and shared hypotheses — so the org scales in deep problem-solving capacity rather than funneling through one person.

Problem it solves¶

From Marc Olson's EBS retrospective (the "Reflecting on scaling performance" section):

I really enjoy going super deep into problems and attacking them until they're complete, but there was a pivotal moment when a colleague that I trusted pointed out that I was becoming a performance bottleneck for our organization. As an engineer who had grown to be an expert in the system, but also who cared really, really deeply about all aspects of EBS, I found myself on every escalation and also wanting to review every commit and every proposed design change. If we were going to be successful, then I had to learn how to scale myself — I wasn't going to solve this with just ownership and bias for action.

Mechanism (the session format Olson describes)¶

Handful of engineers in a room, code + a few terminals projected on a wall.
Shared-systems knowledge composes: each engineer brings partial context; the group assembles the full picture live.
Hypothesis-driven: "Uhhhh, there's no way that's right!" moments surface issues that neither individual was going to catch alone (in the post, the group found a locking/jitter bug in critical-data-structure updates this way).

Outcomes¶

Latent bugs get found that a single reviewer would miss.
Context is transferred — the session is implicit training for the less-senior attendees, without a separate "mentorship" process.
The senior engineer's time is amplified: one session resolves an issue that would otherwise have required many 1-on-1 debugging sessions.

Leadership shift it encodes¶

I realized that empowering people, giving them the ability to safely experiment, can often lead to results that are even better than what was expected. I've spent a large portion of my career since then focusing on ways to remove roadblocks, but leave the guardrails in place, pushing engineers out of their comfort zone.

The management principle: remove roadblocks, keep guardrails.

Seen in¶

sources/2024-08-22-allthingsdistributed-continuous-reinvention-block-storage-at-aws — Olson's first-person account of realizing he had become a single point of failure and of the peer-debugging format as a response.