Skip to content

PATTERN Cited by 2 sources

Language rewrite for concurrency

Definition

Language rewrite for concurrency is the architectural pattern of rewriting a performance-critical layer in a language whose concurrency model matches the workload, once the current language's concurrency primitives have become the bottleneck — while keeping orchestration, definitions, or control logic in the original language.

Not a whole-system rewrite. Layer-targeted. The cost is scoped to the slice that's actually capped, while the broader codebase retains the ergonomic / ecosystem advantages of the starting language.

When the pattern fires

  • Python → Go for mixed CPU + I/O request-serving. Trigger: GIL contention on CPU-bound work under high concurrency, with multi-processing adding coordination overhead rather than fixing it.
  • JVM → Rust for latency-sensitive data-plane work. Trigger: JVM GC pauses tail into the seconds under fanout, and the tail- at-scale math (P(some host mid-pause | fanout=N) → 1) makes percentile SLOs unachievable.
  • Node.js → Go / Rust for CPU-bound request handling where the event loop becomes the bottleneck (not directly in this wiki but the shape is identical).

What the rewrite delivers is not raw speed — it's a concurrency model (goroutines, async/await + zero-cost abstractions, actors) that fits the workload without paying the original language's concurrency-primitive cost.

Instance 1: Feast-Python → Go at Dropbox Dash (2025)

Feast's Python SDK was the initial feature-serving layer for Dash's feature store. Profiling under high concurrency showed CPU-bound JSON parsing + the GIL as the bottleneck. Multi-process workers improved latency "temporarily" but added coordination overhead.

Go rewrite delivers:

  • Thousands of req/s per service.
  • ~5–10ms overhead on top of the ~20ms Dynovault client latency.
  • p95 ~25–35ms end-to-end.

Layer scope: just the serving HTTP/RPC endpoint. Feast (Python) still owns definitions + orchestration; Spark (PySpark) still owns feature computation.

(Source: sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash)

Instance 2: JVM → Rust at Aurora DSQL (2025)

Aurora DSQL started with a Kotlin (JVM) codebase and rewrote the data plane (and eventually the control plane too) in Rust. The forcing function was tail-at-scale: a 40-host simulation with 1s GC stalls delivered ~6K TPS and 10s tail against targets of ~1M TPS and 1s tail.

Pilot-component migration: start with the Adjudicator (simplest data-plane component + already had Rust client libraries); JVM devs rewrote it and got 30K TPS from first cut vs 2–3K TPS years-tuned Kotlin (~10×). Then the rest of the data plane. Then, after initial "Kotlin control plane / Rust data plane" split drift, also the control plane.

(Source: sources/2025-05-27-allthingsdistributed-aurora-dsql-rust-journey)

Common sub-pattern: profile → language → pilot

Both instances share a three-step shape:

  1. Profile to localise the cap. Don't reach for a rewrite before the bottleneck is pinpointed. GIL contention on JSON parsing (Dropbox); GC-pause fanout amplification (DSQL).
  2. Choose the replacement language by concurrency model fit. Not by raw speed. Go's goroutines for mixed CPU + I/O; Rust's ownership + no-GC for latency-predictable data-plane work.
  3. Pilot on a component that has ecosystem headwind (existing libraries, easy contracts). Dropbox picks the serving layer because Feast's Python/Go boundary is an HTTP API. DSQL picks the Adjudicator because Rust journal-client libraries already existed.

What doesn't survive the rewrite

Both posts are explicit that the rewrite isn't a universal improvement:

  • You lose the original language's ecosystem / developer ergonomics / hiring funnel in the rewritten layer.
  • You introduce a cross-language boundary that has to be maintained (serialisation format, deployment topology).
  • If the rewrite is partial, you now have two codebases with shared domain knowledge that can drift (DSQL explicitly retracted the Kotlin-control-plane / Rust-data-plane split to full-system Rust for this reason).

When NOT to apply

  • The current bottleneck isn't actually the language's concurrency model. (If CPU is idle and latency is high, it's an I/O or downstream-dependency problem, not a GIL / GC problem.)
  • The traffic shape won't grow into the current cap. (If you're at 1/10 of the cap and demand is flat, rewrite cost dominates.)
  • The team can't carry two languages operationally (on-call coverage, tooling, build systems).

Seen in

Last updated · 200 distilled / 1,178 read