PATTERN Cited by 1 source
Build-time tech-debt detection¶
Definition¶
Run static-analysis rules in every CI build of every repo in the fleet, treating rule violations not (only) as build-failure gates but as measurable, queryable, prioritizable signals about technical debt across the codebase. The dashboard of rule violations becomes the canonical map of what's broken, where, and how badly.
The wiki's first canonical instance is Netflix's Nebula ArchRules deployment — 358 rules × 5,000 repos × ~1M issues — "allow[ing] us to quickly gain insight into our large fleet of microservices, and identify the areas carrying the most critical technical debt."
When to use¶
- You have enough scale that ad-hoc tech-debt tracking (Jira tickets, retrospective lists) misses too much.
- The tech debt has machine-checkable signatures — deprecated-API usage, security-CVE callsites, prohibited-library imports, naming-convention violations.
- You have CI infrastructure that can run static analysis on every build.
- You have dashboard infrastructure to aggregate violations across repos.
The pattern¶
Rules emit measurable signals¶
Each rule produces structured violation data:
- Rule identifier
- Severity (Low / Medium / High)
- Repo + class + method + line
- Plain-English description of the violation
- Pointer to the relevant code
"Note that failure details feature a detailed plain English description, along with a pointer to the exact line of code in violation." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules
Build outcomes are tiered¶
- High-priority rules → build fails (in Netflix's case, configurable per-repo via failure-threshold).
- Medium / Low rules → reported but don't fail builds.
"Other customizations include disabling running rules on certain source sets and configuring the failure threshold (i.e., high priority failures will cause the build to fail)." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules
Dashboard aggregates the signal¶
Per-CI-build violation data flows to a central dashboard. The dashboard answers:
- Per-rule: how many repos violate this rule? How many total violations? What's the trend?
- Per-repo: what rules does this repo violate?
- Per-team: aggregating per-repo signals up the ownership hierarchy.
- Per-rule-priority: how many High-priority issues fleet-wide?
Operators read the dashboard, prioritize cleanup¶
"Being able to run these rules on this scale allows us to quickly gain insight into our large fleet of microservices, and identify the areas carrying the most critical technical debt. This makes it easier to focus and prioritize our efforts." — sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules
The dashboard is the prioritization input: rules with the most violations + the most affected repos + the highest severity get attention first.
Operational shape¶
Per the Netflix instance:
| Metric | Value |
|---|---|
| Total rules | 358 |
| Repos enforcing | 5,000+ |
| Total issues | ~1,000,000 |
| High-priority issues | ~1,000 (~0.1%) |
| Avg issues per repo | ~200 |
| Build-failing | ~0.1% of issues (high-priority) |
| Reportable | ~99.9% of issues |
The 0.1%-build-fail / 99.9%-report split is the load-bearing ratio: build-failure is reserved for the most urgent issues, so that engineers don't get fatigued. The bulk of issues are measured, not enforced — visible on the dashboard but not blocking work.
The forcing function for adoption¶
Build-time tech-debt detection only works if engineers can't ignore the dashboard. Netflix's framing implies the dashboard is consulted by:
- Library authors — checking who depends on their deprecated APIs (patterns/static-analysis-as-cross-repo-impact-discovery).
- Platform / Infra teams — measuring fleet-wide debt trends.
- Per-team owners — checking which rules their repos violate.
Without one of these forcing functions, the dashboard becomes write-only.
Distinct from CI-as-gate¶
| Aspect | CI-as-gate | Build-time tech-debt detection |
|---|---|---|
| Goal | Block bad changes | Measure and prioritize tech debt |
| Failure mode | Build fails | Dashboard updates |
| Severity | Binary (pass/fail) | Tiered (priority-by-priority) |
| Visibility | Per-PR | Fleet-wide |
| Time horizon | Per-commit | Trend over weeks/months |
The two coexist — high-priority rules use the CI-as-gate model; the bulk of rules use the dashboard model.
Distinct from runtime APM¶
| Aspect | Runtime APM | Build-time tech-debt detection |
|---|---|---|
| What's measured | Latency, errors, throughput | Code-level signals |
| Cost | Always-on instrumentation | Build-time only |
| Visibility into unexercised paths | None | Full |
| Connection to code | Indirect (via traces) | Direct (rule + line) |
APM tells you what's slow / failing in production; build-time tech-debt detection tells you what's structurally wrong in the codebase. Both feed prioritization but at different layers.
Adjacent patterns¶
- patterns/centralized-fleet-wide-rule-catalog — the rule- distribution pattern this pattern is the use case for.
- patterns/bundled-rules-auto-scoped-to-library-consumers — the substrate that makes scoping rules to specific libraries practical.
- patterns/static-analysis-as-cross-repo-impact-discovery — the API-surface-discovery use case this pattern enables.
- patterns/api-stability-annotations — the lifecycle-marking discipline build-time tech-debt detection enforces.
Hard problems¶
- Severity calibration. Tagging a rule High vs Medium is a value judgment; over-tagging High erodes signal, under- tagging hides urgent debt.
- False-positive tax. A noisy rule produces too many violations to be actionable. Every rule needs precision- tuning; some rules can never be precise enough.
- Dashboard fatigue. 1M issues is too many to look at; operators rely on aggregations, but aggregations can hide individual urgent issues.
- Rule authoring overhead. Each new rule is a research + design + testing investment. Catalogs grow slowly.
- Cross-language gaps. Bytecode-based rules cover JVM languages uniformly; for Python/Go/Rust services, a different stack is needed.
Seen in¶
- sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules — first canonical wiki naming. Netflix's 358-rule × 5,000-repo × ~1M-issue deployment is the canonical instance, framed verbatim as "identify the areas carrying the most critical technical debt" via dashboard-driven prioritization.