PATTERN Cited by 1 source
Intra-node parallelism via input/output scaling¶
Pattern¶
For data-pipeline runtimes that expose a broker-style
input/output multiplexing primitive — Redpanda Connect,
Benthos, NiFi, and similar — the decisive throughput-
scaling dimension is often not adding more nodes but running
more parallel input/output pipelines within a single node
to fully saturate the per-node CPU, network, and storage.
"We found that what mattered most was batch settings and increasing parallelism as much as possible. Whether that be through partition count,
snowflake_streamingoptions, or — as it turned out — unlocking massive throughput by scaling the number of inputs and outputs within a single node." (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)
Why it works¶
A single input/output pipeline in Connect-style runtimes
typically saturates a few cores and a fraction of the NIC.
The machine has more of both. Rather than horizontal scaling
the node count (adding more machines, with their orchestration
and network-fanout costs), the broker primitive lets the
operator spin up N parallel pipelines inside the Connect
process. Each reads from its own share of the source
partitions and writes to its own share of destination
channels.
Canonical form in Redpanda Connect:
input:
broker:
inputs:
- kafka_franz: { seed_brokers: [...], topics: [...] }
- kafka_franz: { seed_brokers: [...], topics: [...] }
# ... N of these
output:
broker:
outputs:
- snowflake_streaming: { channel_prefix: prefix-1, ... }
- snowflake_streaming: { channel_prefix: prefix-2, ... }
# ... N of these
Each (input, output) pair runs its own goroutine pool,
batching policy, and channel pool; they share the same
process's TCP stack, memory, and kernel scheduler but not
application-level synchronisation.
The scaling-dimension hierarchy¶
The Redpanda benchmark names three composable scaling dimensions in order of effectiveness:
- Partition count — upstream. The source-topic partition count is the coarsest parallelism unit; each partition can be consumed by exactly one consumer in a consumer group.
- Destination-side options — e.g.,
Snowpipe channel
count via
channel_prefix×max_in_flightfor the Snowflake sink. - Intra-node input/output fan-out — the
brokerprimitive spawning N pipelines per node.
Dimension 3 is the one the benchmark singles out as the "turned out" finding — the surprise scaling lever. Dimensions 1 and 2 were already known; dimension 3 was the quantity-of-pipelines-per-Connect-node that moved the winning configuration from "saturated at a ceiling" to "14.5 GB/s, 45% above documented per-table ceiling."
When intra-node scaling beats adding nodes¶
- Per-node resource headroom. A Connect node running a single pipeline on a 48-core machine uses a small fraction of cores. More nodes just distributes the same under-utilisation across more machines.
- Cost economics. Fewer larger nodes with intra-node parallelism is typically cheaper than more smaller nodes at the same aggregate throughput — fewer instances to pay fixed per-instance costs for (monitoring sidecars, logging, metadata overhead).
- Network locality. Intra-process pipelines share a single TCP socket pool to the upstream broker and downstream sink. Node-count scaling spawns N × pools, amplifying broker-side connection overhead.
When to prefer node-count scaling instead¶
- Fault-isolation. Process-level crash takes down all intra-node pipelines at once; node-level diversity reduces blast radius.
- Upgrade rollouts. Rolling node upgrades at node granularity is safer than restarting a node that carries many pipelines.
- Resource quotas. Some orchestrators enforce per-container CPU/memory caps that limit how much intra-node parallelism one node can usefully run.
- Heterogeneous workloads. Mixing hot and cold pipelines on the same node risks noisy-neighbour contention.
The production sweet spot is usually both dimensions tuned: run enough nodes for fault-isolation and upgrade-safety, then maximise intra-node parallelism on each node to hit the per-node resource ceiling.
Instance in the Redpanda → Snowflake benchmark¶
- 12 Connect nodes on AWS EC2
m7gd.12xlarge - Intra-node
brokerinput/output fan-out — factor not disclosed in the post, but named as the decisive scaling dimension - Sustained 14.5 GB/s aggregate to a single Snowflake table
- Hit Snowpipe-API-level errors ("the Snowpipe API screaming at us") on tests that pushed channel count too aggressively, indicating intra-node parallelism was scaled until it saturated the destination's per-table channel ceiling
Seen in¶
- sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming
— canonical wiki statement of the pattern from Redpanda's
14.5 GB/s Snowflake benchmark. Intra-node parallelism via
brokerinput/output scaling named explicitly as the decisive ("turned out") scaling dimension that moved throughput above documented per-table ceilings.
Related¶
- systems/redpanda-connect — exposes the
brokerprimitive on both inputs and outputs; the canonical runtime for this pattern. - systems/redpanda — the upstream streaming broker; partition count is the coarse-grained parallelism dimension this pattern composes with.
- concepts/snowpipe-streaming-channel — the destination-side parallelism unit that intra-node Connect pipelines feed.
- concepts/batching-latency-tradeoff — batch settings per pipeline compose with intra-node fan-out.
- patterns/count-over-bytesize-batch-trigger — the per-pipeline batch-trigger choice that composes with intra-node parallelism.