PATTERN Cited by 1 source
Runtime capability dispatch — pure-Java SIMD¶
Problem¶
A service wants the performance of SIMD-accelerated math kernels but can't assume the SIMD capability is available at deploy time:
- The JDK Vector API is still an
incubating feature — enabling it requires the runtime flag
--add-modules=jdk.incubator.vector. If a container, staging environment, or operator change drops the flag, the code must still run correctly. - Different hosts in the fleet may have different CPU capabilities (AVX2 vs AVX-512 vs NEON vs no SIMD). The kernel shouldn't crash or silently misbehave on the less-capable ones.
- Long-tail JVM versions in production may lack the Vector API entirely.
The service owner wants: "opt-in to the Vector API for maximum performance, but remain safe and predictable without it."
Solution¶
Detect the SIMD capability at class load, bind the implementation via a factory, and ship a high-quality scalar fallback.
interface MatMul {
void compute(double[] A, double[] B, double[] C,
int M, int N, int D);
}
class MatMulFactory {
static final MatMul INSTANCE = create();
private static MatMul create() {
try {
Class.forName("jdk.incubator.vector.DoubleVector");
return new VectorApiMatMul();
} catch (ClassNotFoundException | NoClassDefFoundError e) {
return new ScalarMatMul();
}
}
}
Three deployment properties this enables¶
- Correctness in every environment. The scalar fallback is the functional contract; the Vector API path is an opportunistic speedup. Operator error with JVM flags can't break the service.
- Drop-in upgrade path. When the Vector API graduates out of incubation, the factory swaps to the stable-API class without the service-owning team touching anything.
- Per-host adaptation. The Vector API's own
DoubleVector.SPECIES_PREFERREDpicks the widest lane width available on the host (4 doubles on AVX2, 8 on AVX-512, NEON on ARM) — no platform-specific code in the service.
Scalar fallback quality matters¶
The fallback isn't just "the naive nested loop." Netflix's scalar
path is a hand-optimised loop-unrolled dot product inspired by
Lucene's VectorUtilDefaultProvider. Two reasons
to invest:
- In failure modes where the Vector API isn't available, the service should still perform well — not fall off a cliff.
- The scalar path is the correctness oracle during development; keeping it well-tuned catches bugs where the vector path miscomputes tail elements or boundary cases.
Why this beats JNI-based alternatives¶
Alternative capability-dispatch strategies exist outside pure Java —
native kernels compiled per-platform behind a JNI bridge, or jextract-
generated FFI. They fail the "safe fallback" property worse than
pure-Java dispatch:
- JNI adds transition overhead on every call.
- Native kernels need per-platform builds + shipping, plus a fallback anyway for unsupported platforms.
- The fallback itself is a different language/toolchain than the hot path, fragmenting the build + testing surface.
Pure-Java SIMD + scalar fallback keeps the entire pipeline in one language with one build.
Caveats¶
- Class-load dispatch is one-shot. The factory decides once per process; dynamic CPU feature additions (hotplug, VM migration) are not accommodated. For server workloads this is almost always fine.
- Version pinning the detection class. Using
Class.forNameon a concrete incubator class couples the detection to the current class name; when the API graduates, the detector must update. - Benchmark discipline. Scalar and vector kernels must be benchmarked with the same memory-layout assumptions (patterns/flat-buffer-threadlocal-reuse) — otherwise dispatch can inadvertently mask a regression on the scalar path.
Seen in¶
- sources/2026-03-03-netflix-optimizing-recommendation-systems-with-jdks-vector-api
— Netflix Ranker binds
MatMulFactoryat class load: ifjdk.incubator.vectoris present, uses a Vector API matmul withfma()accumulators; otherwise falls back to a Lucene-inspired scalar loop-unrolled dot product. Single-video requests continue on the per-item implementation unchanged. Netflix frames the fallback as a production safety property, not just a performance detail.
Related¶
- systems/jdk-vector-api — the incubating SIMD substrate.
- systems/lucene — Netflix's scalar-fallback inspiration.
- patterns/batched-matmul-for-pairwise-similarity — the algorithmic shape this pattern's kernel implements.
- patterns/flat-buffer-threadlocal-reuse — co-deployed memory layout for the vector kernel to pay off.
- concepts/jni-transition-overhead — the JNI-based alternative this pattern deliberately avoids.