PATTERN Cited by 1 source

Intra-node parallelism via input/output scaling¶

Pattern¶

For data-pipeline runtimes that expose a broker-style input/output multiplexing primitive — Redpanda Connect, Benthos, NiFi, and similar — the decisive throughput- scaling dimension is often not adding more nodes but running more parallel input/output pipelines within a single node to fully saturate the per-node CPU, network, and storage.

"We found that what mattered most was batch settings and increasing parallelism as much as possible. Whether that be through partition count, snowflake_streaming options, or — as it turned out — unlocking massive throughput by scaling the number of inputs and outputs within a single node." (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

Why it works¶

A single input/output pipeline in Connect-style runtimes typically saturates a few cores and a fraction of the NIC. The machine has more of both. Rather than horizontal scaling the node count (adding more machines, with their orchestration and network-fanout costs), the broker primitive lets the operator spin up N parallel pipelines inside the Connect process. Each reads from its own share of the source partitions and writes to its own share of destination channels.

Canonical form in Redpanda Connect:

input:
  broker:
    inputs:
      - kafka_franz: { seed_brokers: [...], topics: [...] }
      - kafka_franz: { seed_brokers: [...], topics: [...] }
      # ... N of these

output:
  broker:
    outputs:
      - snowflake_streaming: { channel_prefix: prefix-1, ... }
      - snowflake_streaming: { channel_prefix: prefix-2, ... }
      # ... N of these

Each (input, output) pair runs its own goroutine pool, batching policy, and channel pool; they share the same process's TCP stack, memory, and kernel scheduler but not application-level synchronisation.

The scaling-dimension hierarchy¶

The Redpanda benchmark names three composable scaling dimensions in order of effectiveness:

Partition count — upstream. The source-topic partition count is the coarsest parallelism unit; each partition can be consumed by exactly one consumer in a consumer group.
Destination-side options — e.g., Snowpipe channel count via channel_prefix × max_in_flight for the Snowflake sink.
Intra-node input/output fan-out — the broker primitive spawning N pipelines per node.

Dimension 3 is the one the benchmark singles out as the "turned out" finding — the surprise scaling lever. Dimensions 1 and 2 were already known; dimension 3 was the quantity-of-pipelines-per-Connect-node that moved the winning configuration from "saturated at a ceiling" to "14.5 GB/s, 45% above documented per-table ceiling."

When intra-node scaling beats adding nodes¶

Per-node resource headroom. A Connect node running a single pipeline on a 48-core machine uses a small fraction of cores. More nodes just distributes the same under-utilisation across more machines.
Cost economics. Fewer larger nodes with intra-node parallelism is typically cheaper than more smaller nodes at the same aggregate throughput — fewer instances to pay fixed per-instance costs for (monitoring sidecars, logging, metadata overhead).
Network locality. Intra-process pipelines share a single TCP socket pool to the upstream broker and downstream sink. Node-count scaling spawns N × pools, amplifying broker-side connection overhead.

When to prefer node-count scaling instead¶

Fault-isolation. Process-level crash takes down all intra-node pipelines at once; node-level diversity reduces blast radius.
Upgrade rollouts. Rolling node upgrades at node granularity is safer than restarting a node that carries many pipelines.
Resource quotas. Some orchestrators enforce per-container CPU/memory caps that limit how much intra-node parallelism one node can usefully run.
Heterogeneous workloads. Mixing hot and cold pipelines on the same node risks noisy-neighbour contention.

The production sweet spot is usually both dimensions tuned: run enough nodes for fault-isolation and upgrade-safety, then maximise intra-node parallelism on each node to hit the per-node resource ceiling.

Instance in the Redpanda → Snowflake benchmark¶

12 Connect nodes on AWS EC2 m7gd.12xlarge
Intra-node broker input/output fan-out — factor not disclosed in the post, but named as the decisive scaling dimension
Sustained 14.5 GB/s aggregate to a single Snowflake table
Hit Snowpipe-API-level errors ("the Snowpipe API screaming at us") on tests that pushed channel count too aggressively, indicating intra-node parallelism was scaled until it saturated the destination's per-table channel ceiling

Seen in¶

sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming — canonical wiki statement of the pattern from Redpanda's 14.5 GB/s Snowflake benchmark. Intra-node parallelism via broker input/output scaling named explicitly as the decisive ("turned out") scaling dimension that moved throughput above documented per-table ceilings.

systems/redpanda-connect — exposes the broker primitive on both inputs and outputs; the canonical runtime for this pattern.
systems/redpanda — the upstream streaming broker; partition count is the coarse-grained parallelism dimension this pattern composes with.
concepts/snowpipe-streaming-channel — the destination-side parallelism unit that intra-node Connect pipelines feed.
concepts/batching-latency-tradeoff — batch settings per pipeline compose with intra-node fan-out.
patterns/count-over-bytesize-batch-trigger — the per-pipeline batch-trigger choice that composes with intra-node parallelism.