Skip to content

PATTERN Cited by 1 source

Build-time tech-debt detection

Definition

Run static-analysis rules in every CI build of every repo in the fleet, treating rule violations not (only) as build-failure gates but as measurable, queryable, prioritizable signals about technical debt across the codebase. The dashboard of rule violations becomes the canonical map of what's broken, where, and how badly.

The wiki's first canonical instance is Netflix's Nebula ArchRules deployment — 358 rules × 5,000 repos × ~1M issues — "allow[ing] us to quickly gain insight into our large fleet of microservices, and identify the areas carrying the most critical technical debt."

When to use

  • You have enough scale that ad-hoc tech-debt tracking (Jira tickets, retrospective lists) misses too much.
  • The tech debt has machine-checkable signatures — deprecated-API usage, security-CVE callsites, prohibited-library imports, naming-convention violations.
  • You have CI infrastructure that can run static analysis on every build.
  • You have dashboard infrastructure to aggregate violations across repos.

The pattern

Rules emit measurable signals

Each rule produces structured violation data:

  • Rule identifier
  • Severity (Low / Medium / High)
  • Repo + class + method + line
  • Plain-English description of the violation
  • Pointer to the relevant code

"Note that failure details feature a detailed plain English description, along with a pointer to the exact line of code in violation."sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules

Build outcomes are tiered

  • High-priority rules → build fails (in Netflix's case, configurable per-repo via failure-threshold).
  • Medium / Low rules → reported but don't fail builds.

"Other customizations include disabling running rules on certain source sets and configuring the failure threshold (i.e., high priority failures will cause the build to fail)."sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules

Dashboard aggregates the signal

Per-CI-build violation data flows to a central dashboard. The dashboard answers:

  • Per-rule: how many repos violate this rule? How many total violations? What's the trend?
  • Per-repo: what rules does this repo violate?
  • Per-team: aggregating per-repo signals up the ownership hierarchy.
  • Per-rule-priority: how many High-priority issues fleet-wide?

Operators read the dashboard, prioritize cleanup

"Being able to run these rules on this scale allows us to quickly gain insight into our large fleet of microservices, and identify the areas carrying the most critical technical debt. This makes it easier to focus and prioritize our efforts."sources/2026-05-08-netflix-scaling-archunit-with-nebula-archrules

The dashboard is the prioritization input: rules with the most violations + the most affected repos + the highest severity get attention first.

Operational shape

Per the Netflix instance:

Metric Value
Total rules 358
Repos enforcing 5,000+
Total issues ~1,000,000
High-priority issues ~1,000 (~0.1%)
Avg issues per repo ~200
Build-failing ~0.1% of issues (high-priority)
Reportable ~99.9% of issues

The 0.1%-build-fail / 99.9%-report split is the load-bearing ratio: build-failure is reserved for the most urgent issues, so that engineers don't get fatigued. The bulk of issues are measured, not enforced — visible on the dashboard but not blocking work.

The forcing function for adoption

Build-time tech-debt detection only works if engineers can't ignore the dashboard. Netflix's framing implies the dashboard is consulted by:

Without one of these forcing functions, the dashboard becomes write-only.

Distinct from CI-as-gate

Aspect CI-as-gate Build-time tech-debt detection
Goal Block bad changes Measure and prioritize tech debt
Failure mode Build fails Dashboard updates
Severity Binary (pass/fail) Tiered (priority-by-priority)
Visibility Per-PR Fleet-wide
Time horizon Per-commit Trend over weeks/months

The two coexist — high-priority rules use the CI-as-gate model; the bulk of rules use the dashboard model.

Distinct from runtime APM

Aspect Runtime APM Build-time tech-debt detection
What's measured Latency, errors, throughput Code-level signals
Cost Always-on instrumentation Build-time only
Visibility into unexercised paths None Full
Connection to code Indirect (via traces) Direct (rule + line)

APM tells you what's slow / failing in production; build-time tech-debt detection tells you what's structurally wrong in the codebase. Both feed prioritization but at different layers.

Adjacent patterns

Hard problems

  • Severity calibration. Tagging a rule High vs Medium is a value judgment; over-tagging High erodes signal, under- tagging hides urgent debt.
  • False-positive tax. A noisy rule produces too many violations to be actionable. Every rule needs precision- tuning; some rules can never be precise enough.
  • Dashboard fatigue. 1M issues is too many to look at; operators rely on aggregations, but aggregations can hide individual urgent issues.
  • Rule authoring overhead. Each new rule is a research + design + testing investment. Catalogs grow slowly.
  • Cross-language gaps. Bytecode-based rules cover JVM languages uniformly; for Python/Go/Rust services, a different stack is needed.

Seen in

Last updated · 542 distilled / 1,571 read