PATTERN Cited by 1 source

Delete inner parallelization inside outer orchestrator¶

Context¶

A tool or component has its own worker pool / thread pool / multiprocessing logic because at the time it was written, no outer orchestrator existed. Performance was improved by parallelising work inside the tool. This was a reasonable local decision.

Later, an outer orchestrator arrives — Bazel, Kubernetes, Ray, Temporal, a pipeline runner — that can parallelise at a coarser granularity (across many instances of the tool). The tool now runs inside the outer orchestrator's scheduling domain.

Problem¶

The inner parallelization now creates three compounding issues:

Resource contention. Both the outer orchestrator and the inner pool try to use the same CPU/RAM budget, and each is oblivious to the other's scheduling decisions. Thrashing ensues.
Coarser granularity. The inner pool parallelises within one invocation of the tool. The outer orchestrator can parallelise across many invocations across many machines. The inner lever is strictly weaker than the outer lever it blocks.
Redundant scheduling. The outer orchestrator already knows which work is independent and can run in parallel (it has the full graph). The inner pool is doing the same work with less information.

Slack's articulation:

We're parallelizing the work across processes, which is good — unless we could be parallelizing across machines, machines with lots more CPU cores. If we have those resources available, we cannot use them here. And we're actually making Bazel less effective at its core function of parallelizing independent build steps: Bazel and the script's worker processes are contending for the same set of resources. The script might even be parallelizing work that Bazel already knows it does not need!

— Slack, Build better software to build software better

This is a concrete instance of a concepts/layering-violation: the tool's business logic has fused with orchestration and parallel execution.

Solution¶

Delete the inner parallelization code. Shrink the tool's API to one-unit-in / one-unit-out. Make it boringly sequential. Let the outer orchestrator parallelise by launching N instances of the simpler tool concurrently.

For Slack's frontend bundler:

Before: one call, many TS sources in, many bundles out, internal worker pool parallelising bundle compilation.
After: one call per bundle, TS+CSS processed independently but sequentially inside the one call, Bazel parallelises across bundles by launching many bundler processes concurrently across workers.

From the post:

To be more effective, we really just needed to do less. We deleted a lot of code. The new version of the frontend builder was much, much simpler. It didn't parallelize. It had a much smaller "API" interface. It took in one set of source files, and built one output bundle, with TypeScript and CSS processed independently.

Outcomes¶

Slack's reported gains from this specific change:

Higher cache hit rate. Each bundle is cached independently, on its direct inputs only. Single-file changes don't invalidate unrelated bundles.
Reduction in full-rebuild time. Given enough workers, Bazel could run all bundle builds plus all CSS compilation steps concurrently — whereas the old inner pool could only parallelise within one invocation.
Single-concern maintainability. The builder now has one job: build a bundle. No contention-debugging, no worker-pool sizing, no synchronisation bugs.

Applicability checklist¶

Apply this pattern when all of the following are true:

An outer orchestrator exists that can parallelise at the granularity you care about.
The outer orchestrator has enough information (via the declared graph / manifest / pipeline) to know what's independent.
The inner parallelization's unit of work is smaller than or equal to what the outer orchestrator can distribute.
Resource budgets (CPU, memory, I/O) are shared between the two.

If the outer orchestrator can't parallelise at the needed granularity (e.g. it only runs one process per pipeline step), keep the inner parallelization.

Kubernetes + in-app thread pool. A microservice with an internal worker pool that duplicates what Kubernetes HorizontalPodAutoscaler could do at the replica level.
Temporal workflow with in-activity goroutines. An activity that fans out work internally, when Temporal could fan out across many activities.
Test runner with in-test parallelism. A test that spawns threads when the test runner could parallelise at the test level.

The principle in each case: let the outermost layer that knows what's independent do the parallelization; delete the inner layer's attempt.

concepts/layering-violation — the structural diagnosis.
concepts/separation-of-concerns — the parent principle.
concepts/build-graph — the DAG the outer orchestrator uses.
concepts/cache-granularity — the companion lever (smaller units cache better and parallelise better).
systems/bazel — the canonical outer orchestrator where this pattern applies at build-system altitude.

Seen in¶

sources/2025-11-06-slack-build-better-software-to-build-software-better — Slack deletes its frontend bundler's in-process worker pool so Bazel can parallelise at the bundle-action granularity across machines; contributes to the 25-min → 10-min frontend speed-up after the earlier backend-frontend decoupling.