CONCEPT Cited by 1 source

Vectorized query engine¶

Definition¶

A vectorized query engine is a query-execution architecture that operates on batches of rows at a time using SIMD / cache-friendly column-major loops, instead of one row at a time with per-row method dispatch (the row-at-a-time Volcano model). Vectorisation collapses the per-row overhead — function call dispatch, type coercion, branch prediction misses — to a per-batch cost amortised across thousands of rows.

Volcano vs vectorised — the model¶

Volcano (row-at-a-time)¶

for each row in source:
    for each operator in pipeline:
        operator.process(row)

Each operator's process() is a virtual method call per row. Branch prediction is poor (operators see one row at a time, no context). The CPU pipeline stalls on indirect branches.

Vectorised (batch-of-rows)¶

for each batch in source:                # batch size ~1024 rows
    for each operator in pipeline:
        operator.process_batch(batch)    # tight inner loop on column slabs

Each process_batch is a tight loop over a column-major slab of values. SIMD instructions (AVX2 / AVX-512 / NEON) operate on multiple values per CPU cycle. Branch prediction is good (the inner loop is predictable). The CPU pipeline runs hot.

Why it matters¶

2-10× single-thread throughput on common analytical workloads vs Volcano-shape engines.
Cache-coherent. Column-major batches are contiguous in memory; L1 / L2 caches are used effectively.
Compiler-friendly. Loop bodies are simple enough that modern compilers auto-vectorise them; explicit SIMD intrinsics are an optimisation, not a requirement.
Composes with columnar storage. Parquet / Delta / Iceberg files are already column-major; vectorised execution reads native column slabs without row-major materialisation.

Canonical instances on the wiki¶

systems/photon — Databricks' C++-native vectorised engine for Apache Spark. The VF Match FDR pipeline disclosed Photon collapsing a Spark-on-Splink partition-skew straggler from 30 minutes to ~2 minutes — 15× improvement at the worst-case partition.
systems/duckdb — single-machine vectorised analytical database; a reference implementation of the vectorised engine shape (originally Hyper / MonetDB lineage).
Velox — Meta's open-source vectorised execution library; used as the Photon-equivalent in Presto.
ClickHouse — vectorised columnar OLAP database from the ground up.

Why ER / record-linkage workloads benefit specifically¶

The VF Match canonicalisation shows vectorised execution helping a non-OLAP workload (entity resolution via Splink pairwise comparison). Two structural reasons:

Pairwise string / numeric comparisons are branch-predictable, SIMD-friendly inner loops when operating on column-major batches of candidate pairs.
Match-weight aggregation (Fellegi-Sunter: W = Σ_f w_f) is a column-major reduce — sum per-column log-Bayes-factor across rows. A textbook SIMD reduce.

The 30 min → 2 min observation is a step-change because eliminating JVM per-row dispatch overhead on the hot partition was the dominant cost. Vectorisation doesn't change the partition's size; it changes the per-record work cost so that even a skewed-fat partition finishes in tractable time.

What vectorised engines don't fix¶

Skew itself. Vectorised execution makes each partition faster, but does not redistribute data. A partition with 35× the median work still has 35× the work — just at a faster constant.
UDF performance. Python / arbitrary-language UDFs typically fall back to row-at-a-time execution, breaking the vectorised pipeline. Common engineering smell: a Photon-eligible plan silently switches to JVM mode at a Python UDF.
I/O-bound workloads. When the bottleneck is reading from object storage, vectorisation barely helps.
Small-data workloads. The batch overhead is not free; for small queries (<1k rows) Volcano is comparable.

Failure modes¶

Hidden Volcano fallback. Operators not yet ported to the vectorised path silently fall back; the user expects vectorised speed and gets Volcano speed. Mitigation: explainplan inspection.
Memory shape divergence. Vectorised batches have different memory characteristics; very wide schemas with many sparse columns can hit memory limits where Volcano didn't.
Cost / billing surprise. Photon-enabled clusters bill at a higher DBU rate; for non-Photon-amenable workloads the higher rate is paid without speedup.

Seen in¶

sources/2026-05-20-databricks-virtue-foundation-medical-volunteers-72-countries — canonical wiki source. VF Match FDR canonicalises vectorised execution at non-OLAP altitude (entity-resolution pairwise comparison on Splink), with a quantified 15× improvement on worst-case partition latency from enabling Photon.

systems/photon — Databricks' instance.
systems/duckdb — single-machine reference instance.
systems/apache-spark — the host engine Photon vectorises.
concepts/columnar-storage — the in-memory representation vectorised execution exploits.
concepts/curse-of-the-last-reducer — the failure mode vectorised execution canonically remediates by collapsing per-record work cost.
concepts/partition-skew-data-skew — adjacent failure mode at the data-distribution altitude.