PATTERN Cited by 2 sources

Language rewrite for concurrency¶

Definition¶

Language rewrite for concurrency is the architectural pattern of rewriting a performance-critical layer in a language whose concurrency model matches the workload, once the current language's concurrency primitives have become the bottleneck — while keeping orchestration, definitions, or control logic in the original language.

Not a whole-system rewrite. Layer-targeted. The cost is scoped to the slice that's actually capped, while the broader codebase retains the ergonomic / ecosystem advantages of the starting language.

When the pattern fires¶

Python → Go for mixed CPU + I/O request-serving. Trigger: GIL contention on CPU-bound work under high concurrency, with multi-processing adding coordination overhead rather than fixing it.
JVM → Rust for latency-sensitive data-plane work. Trigger: JVM GC pauses tail into the seconds under fanout, and the tail- at-scale math (P(some host mid-pause | fanout=N) → 1) makes percentile SLOs unachievable.
Node.js → Go / Rust for CPU-bound request handling where the event loop becomes the bottleneck (not directly in this wiki but the shape is identical).

What the rewrite delivers is not raw speed — it's a concurrency model (goroutines, async/await + zero-cost abstractions, actors) that fits the workload without paying the original language's concurrency-primitive cost.

Instance 1: Feast-Python → Go at Dropbox Dash (2025)¶

Feast's Python SDK was the initial feature-serving layer for Dash's feature store. Profiling under high concurrency showed CPU-bound JSON parsing + the GIL as the bottleneck. Multi-process workers improved latency "temporarily" but added coordination overhead.

Go rewrite delivers:

Thousands of req/s per service.
~5–10ms overhead on top of the ~20ms Dynovault client latency.
p95 ~25–35ms end-to-end.

Layer scope: just the serving HTTP/RPC endpoint. Feast (Python) still owns definitions + orchestration; Spark (PySpark) still owns feature computation.

(Source: sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash)

Instance 2: JVM → Rust at Aurora DSQL (2025)¶

Aurora DSQL started with a Kotlin (JVM) codebase and rewrote the data plane (and eventually the control plane too) in Rust. The forcing function was tail-at-scale: a 40-host simulation with 1s GC stalls delivered ~6K TPS and 10s tail against targets of ~1M TPS and 1s tail.

Pilot-component migration: start with the Adjudicator (simplest data-plane component + already had Rust client libraries); JVM devs rewrote it and got 30K TPS from first cut vs 2–3K TPS years-tuned Kotlin (~10×). Then the rest of the data plane. Then, after initial "Kotlin control plane / Rust data plane" split drift, also the control plane.

(Source: sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey)

Common sub-pattern: profile → language → pilot¶

Both instances share a three-step shape:

Profile to localise the cap. Don't reach for a rewrite before the bottleneck is pinpointed. GIL contention on JSON parsing (Dropbox); GC-pause fanout amplification (DSQL).
Choose the replacement language by concurrency model fit. Not by raw speed. Go's goroutines for mixed CPU + I/O; Rust's ownership + no-GC for latency-predictable data-plane work.
Pilot on a component that has ecosystem headwind (existing libraries, easy contracts). Dropbox picks the serving layer because Feast's Python/Go boundary is an HTTP API. DSQL picks the Adjudicator because Rust journal-client libraries already existed.

What doesn't survive the rewrite¶

Both posts are explicit that the rewrite isn't a universal improvement:

You lose the original language's ecosystem / developer ergonomics / hiring funnel in the rewritten layer.
You introduce a cross-language boundary that has to be maintained (serialisation format, deployment topology).
If the rewrite is partial, you now have two codebases with shared domain knowledge that can drift (DSQL explicitly retracted the Kotlin-control-plane / Rust-data-plane split to full-system Rust for this reason).

When NOT to apply¶

The current bottleneck isn't actually the language's concurrency model. (If CPU is idle and latency is high, it's an I/O or downstream-dependency problem, not a GIL / GC problem.)
The traffic shape won't grow into the current cap. (If you're at 1/10 of the cap and demand is flat, rewrite cost dominates.)
The team can't carry two languages operationally (on-call coverage, tooling, build systems).

concepts/gil-contention — the Python-specific forcing function.
concepts/memory-safety — adjacent language-choice axis (why Rust specifically, when the alternative is also memory-safe via GC).
patterns/pilot-component-language-migration — how to stage such a rewrite.
systems/dash-feature-store, systems/aurora-dsql — the two canonical instances.

Seen in¶

sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash — Python → Go serving rewrite at systems/dash-feature-store.
sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey — JVM → Rust rewrite at systems/aurora-dsql.