PATTERN Cited by 1 source
Invest widely then double down on impact¶
Problem¶
A reliability program (or any program with a trailing metric) faces two unknowns simultaneously at kickoff:
- Which sources of impact matter most? Incidents come from multiple causes (deploys, capacity, dependency failure, config changes, hardware). Pre-program analysis can't predict with confidence which categories will yield the biggest wins from investment.
- Which mitigation approaches will work? For any given cause, several mitigations are plausible (automated rollback, better monitoring, staged rollout, phased cohorts, circuit breaking, graceful degradation). Pre-program analysis can't predict with confidence which mitigations will land vs flop.
The product of these two unknowns is a high-dimensional investment space with limited pre-program signal. A commit-everything-to-one-bet strategy risks picking the wrong bet; a spread-everything-thin strategy risks under-investing where it counts.
Solution¶
A staged investment strategy with five disciplined axes:
- Invest widely initially and bias for action. Start projects across multiple causes and multiple mitigation approaches simultaneously. Don't converge prematurely on a single bet.
- Focus on areas of known pain first. Use existing incident/operational data to rank starting areas. Highest- pain areas get first investment — they're where the signal will be cleanest.
- Invest further in projects or patterns based on results. As each project's data arrives (over the program's 3-6 month trailing-metric lag — see concepts/trailing-metric-patience), double down on what works; copy successful patterns to new substrates.
- Curtail investment in the least impactful areas. When a project's results land below expected impact, explicitly reduce further investment there. This is not failure — it's the data signal that guides the next round.
- Set a flexible shorter-term roadmap which may change based on results. The roadmap is a living artifact, not a multi-year-plan-locked-in-January commitment.
The pattern is canonically portfolio-discipline under high initial uncertainty, rather than commit-to-a-single- long-term-bet.
Canonical disclosure¶
Slack's 2025-10-07 Deploy Safety retrospective canonicalises the pattern verbatim (Source: sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change):
*"This influenced our investment strategy to be:
- Invest widely initially and bias for action
- Focus on areas of known pain first
- Invest further in projects or patterns based on results
- Curtail investment in the least impactful areas
- Set a flexible shorter-term roadmap which may change based on results."*
Explicit framing on project failure:
"It's very important to note that projects that didn't have the desired impact are not failures, they're a critical input to our success through guiding investment and understanding which areas are of greater value. Not all projects will be as impactful, and this is by design."
Why each axis is load-bearing¶
Invest widely initially¶
- Single-bet strategies fail when you pick wrong.
- Information value of "we tried X and it didn't work" is high early in the program and low late.
- Running multiple projects in parallel extracts more information per quarter than serial experiments.
Bias for action¶
- Trailing-metric programs need multiple quarters of signal to know anything. Waiting for "perfect" information means losing months of feedback loop.
- "What should we invest in?" is itself a trailing-metric question; acting based on "what seems promising" and letting the data decide is cheaper than trying to decide pre-investment.
Focus on known-pain areas first¶
- Signal-to-noise is best where the impact was largest.
- Return on a successful mitigation is highest where the incident count was highest.
- Political capital is easiest to get for projects that mitigate recent outages teams remember.
Invest further based on results¶
- Doubling down on a working pattern (Slack: "Webapp backend metrics-based deploy + automatic rollback") across substrates is the leverage point (see patterns/centralised-deployment-orchestration-across-systems).
- Not every pattern is portable — some wins are substrate- specific. The pattern-portability question is itself a program-metric question.
Curtail unsuccessful investment explicitly¶
- The program's budget is finite. Unspent curtailment lets budget flow to higher-impact areas.
- "Project ended early" must be safe organisationally — teams shouldn't be punished for acknowledging a project didn't land.
Flexible roadmap¶
- A locked-in roadmap forces continued investment in areas the data doesn't support.
- A too-agile roadmap loses long-term investment continuity.
- The balance is quarterly review cadence (Slack: "Executive reviews every 4-6 weeks") with willingness to adjust.
Concrete Slack example¶
Slack's Webapp backend investment sequence (2023-2024, verbatim from the retrospective):
- Q1: Engineer automatic metric monitoring.
- Q2: Confirm customer-impact alignment via automatic alerts and manual rollback actions. (Pre-automation regime.)
- Q3-Q4: Invest in automatic deployments and rollback.
- Q4+: Prove success with many automatic rollbacks keeping customer impact below 10 minutes. (This is where the doubling-down decision was made.)
- Q4+: Further investment to monitor additional metrics and invest in manual rollback optimisations.
- Q4+: Invest in a manual Frontend rollback capability. (Pattern copy to a new substrate.)
- Q4+: Aligned further investment toward the centralised deployment orchestration system. (Pattern generalisation across substrates.)
The shape is: seed Webapp backend → prove the pattern → copy/generalise to other substrates.
Slack also discloses the inverse: "There have been too many projects to list, some more successful (e.g., faster Mobile App issue detection) and others where the impact hasn't been as noticeable." — explicit acknowledgement that some projects landed below expectation and were de-emphasised.
Relationship to wiki primitives¶
- concepts/trailing-metric-patience — the discipline this pattern is premised on; without patience, curtailment happens before the signal arrives.
- concepts/customer-impact-hours-metric — the program metric that the per-project decisions feed into.
- patterns/customer-driven-prioritization — a sibling pattern at the product-investment altitude; shares the "known-pain-first" discipline.
- patterns/ab-test-rollout — sibling at smaller-grained-decision altitude; same structural shape (try, measure, double down or curtail).
- patterns/centralised-deployment-orchestration-across-systems — the canonical "successful pattern copied across substrates" output of this investment strategy in Slack's program.
Operational cadence¶
Slack discloses two cadences:
- Executive reviews every 4-6 weeks — alignment + support + roadmap adjustment.
- High-level priority in company/engineering goals (OKR, V2MOM) — program priority maintained at company level, which protects the long-term investment against short-term pressure to deprioritise.
Exec sponsorship at the level of SVP + VP (Slack names three exec sponsors: SVP Milena Talavera, SVP Peter Secor, VP Cisco Vila) is load-bearing on this strategy — without sustained exec alignment, the "flexible roadmap" axis devolves into "whatever's urgent this week."
Caveats¶
- Requires a long-enough runway. Programs that need to deliver in one quarter cannot run this strategy; the feedback loop is too slow.
- Requires psychological safety. Teams must feel safe ending projects early. An org that treats project cancellation as team failure breaks the "curtail based on results" axis.
- Requires disciplined results measurement. "Results" must be defined objectively; subjective evaluation leads to confirmation-biased doubling-down.
- Portfolio-strategy risks include underbudgeting the winners. If doubling-down triggers are too cautious, the program under-invests in what works.
- Not everyone agrees early what counts as "impact". Alignment on the project → program metric translation must be made explicit (Slack uses the three-layer chain from concepts/customer-impact-hours-metric).
Seen in¶
- sources/2025-10-07-slack-deploy-safety-reducing-customer-impact-from-change — canonical five-axis investment strategy for the 18-month Deploy Safety Program; explicit framing that below-expectation projects are "not failures" but "critical input"; doubled-down on Webapp-backend automatic-rollback pattern by copying to Webapp frontend + generalising to centralised orchestration.