Redpanda — Real-time analytics at scale: Redpanda and Snowflake Streaming¶
Summary¶
Unsigned Redpanda benchmark post disclosing a large-scale performance
test of a Redpanda → Snowflake streaming pipeline built with
Redpanda Connect's
snowflake_streaming output connector. A 9-node Redpanda
Enterprise cluster (AWS EC2 m7gd.16xlarge) fed 12 Redpanda
Connect nodes (m7gd.12xlarge), landing 3.8 billion 1 KB
AVRO-encoded messages as rows in a single Snowflake table. Peak
sustained throughput: 14.5 GB/s with P50 ≈ 2.18 s and P99
≈ 7.49 s end-to-end latency. The 14.5 GB/s result exceeds the
Snowflake-documented single-table ceiling of 10 GB/s by 45%.
Disaggregated latency attribution: 86% of the 7.49 s P99 budget
is in the Snowflake upload-then-register-then-commit step, not in
the Redpanda read / transport / decode path — the Snowpipe
Streaming commit protocol is the dominant contributor. Four
tuning findings the post canonicalises for the wiki: (1) AVRO
over JSON gives ~20% throughput uplift via smaller payload after
compression; (2) batching by count beats batching by byte_size
because byte_size enforcement requires per-message size
calculation overhead; (3) build_paralellism should be tuned
close to the machine core count with a small reserve — on
48-core instances the test used 40; (4) Snowpipe Streaming
channels are the parallelism unit — controlled by
channel_prefix × max_in_flight — with a hard ceiling of
10,000 channels per table. The post also names
intra-node input/output scaling (running many parallel
Connect pipelines within a single Connect process) as the
decisive throughput multiplier, rather than adding more Connect
nodes. Control-group result (Redpanda → drop sink) capped
at 15.1 GB/s / P99 8.38 ms — setting an upper bound on what the
pipeline could achieve without Snowflake in the loop.
Key takeaways¶
-
14.5 GB/s into a single Snowflake table beats the documented 10 GB/s per-table ceiling by 45%. Snowflake's public "Snowpipe Streaming high-throughput limitations" doc caps best single-table performance at 10 GB/s aggregate. The benchmark's winning configuration sustained 14.5 GB/s by scaling channel count via
channel_prefix×max_in_flightand scaling intra-node parallelism viabrokerinputs/outputs on each Connect node. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming) -
86% of the end-to-end P99 latency budget lives in the Snowflake upload/register/commit steps. Of the 7.49 s P99 end-to-end latency, ~6.44 s is attributable to the Snowpipe-Streaming commit path. The Redpanda → Connect → Snowflake-upload path contributes only ~1.05 s. This is the analytical-sink-as-bottleneck signature — the streaming broker is not the dominant latency contributor in a real-time-analytics pipeline landing in Snowflake. Public-internet transport; AWS PrivateLink would further reduce it. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming; see concepts/snowpipe-streaming-channel)
-
AVRO produces ~20% higher throughput than JSON for the same logical 1 KB payload. The 3.8B-message dataset was randomised per-record to defeat compression. AVRO's binary encoding produces smaller on-wire representation than JSON's text encoding at equal information content; the broker throughput delta is the wire-payload-size delta post-codec. Canonical wiki instance of binary-over-text format choice for streaming-broker throughput. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming; patterns/binary-format-for-broker-throughput)
-
Count-based batching outperforms byte-size-based batching because
byte_sizeenforcement requires per-message size calculation whereascountis a simple integer compare. The finding is specifically about trigger-evaluation CPU cost: on a hot produce path, evaluating "have I reached N bytes?" requires the producer to have serialized size information for every incoming message before the batch-close decision, whereas "have I reached N messages?" is an increment-and- compare. Canonicalised as patterns/count-over-bytesize-batch-trigger. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming) -
build_paralellismshould be tuned to (available_cores − reserved_cores). Snowpipe-Streaming build steps (serialisation + commit preparation) are a latency bottleneck in thesnowflake_streamingconnector. The connector exposesbuild_paralellism(note the spelling in the post) to run build in parallel threads. The benchmark's 48-core Connect nodes setbuild_paralellism: 40, reserving 8 cores for "other processes". Named as the concrete latency-reducing knob in the Snowpipe-Streaming commit path. Canonicalised as concepts/build-parallelism-for-ingest-serialization. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming) -
Snowpipe-Streaming channels max at 10,000 per table. Channels are the parallelism unit of Snowpipe Streaming; multiple channels write to the same table concurrently. Redpanda Connect exposes two knobs in combination:
channel_prefix(namespace) ×max_in_flight(channels per prefix). Hitting the 10,000-channel ceiling surfaces as "the Snowpipe API screaming at us" — an explicit failure mode the benchmark encountered on aggressive channel-scaling tests. Canonicalised as concepts/snowpipe-streaming-channel. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming) -
Intra-node input/output scaling — running many parallel pipelines within one Connect process — unlocked the headline throughput, not adding more Connect nodes. Redpanda Connect's
brokerinput/output primitive parallelises inputs and outputs within a single node to "fully use the system's resources and boost performance." The post is explicit that this — rather than partition count alone — was the decisive scaling dimension: "unlocking massive throughput by scaling the number of inputs and outputs within a single node." Canonicalised as patterns/intra-node-parallelism-via-input-output-scaling. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming) -
Control-group (Redpanda →
drop) ceiling: 15.1 GB/s at P99 8.38 ms in 5 minutes total. Decoding and reading all 3.8B messages to a null sink took 5 minutes; Snowflake added 1 minute of wall-clock overhead (6 minutes total) while costing most of the P99 budget (8.38 ms → 7.49 s). Useful as an architectural upper bound: the Redpanda + Connect substrate could sustain 15.1 GB/s × 1000 = ~15,100 ms of P99 budget left on the table, all of which went into the Snowflake commit path in the end-to-end test. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)
Operational numbers¶
| Dimension | Value |
|---|---|
| Redpanda cluster | 9 × AWS EC2 m7gd.16xlarge |
| Redpanda Connect cluster | 12 × AWS EC2 m7gd.12xlarge |
| Topic partition count | 1,200 |
| Messages streamed | 3.8 billion |
| Payload size (exact) | 1,000 bytes (1 KB) |
| Payload encoding | AVRO (randomised) |
| Control-group (drop sink) throughput | 15.1 GB/s |
| Control-group P99 latency | 8.38 ms |
| Control-group run time | 5 minutes total |
| Winning-config throughput | 14.5 GB/s |
| Winning-config P50 latency | 2.18 s |
| Winning-config P99 latency | 7.49 s |
| Winning-config run time | ~6 minutes total |
| Snowflake per-table documented ceiling | 10 GB/s aggregate |
| Overage vs Snowflake doc | +45% |
| Snowflake fraction of P99 | 86% (~6.44 s of 7.49 s) |
build_paralellism setting |
40 (on 48-core machines) |
| Snowpipe channel ceiling | 10,000 per table |
| Connection | Public internet (no PrivateLink) |
Systems extracted¶
- systems/redpanda — 9-node Enterprise cluster on AWS EC2
m7gd.16xlarge; 1,200-partition topic; deployed via official Terraform + Ansible automation. - systems/redpanda-connect — 12-node cluster on
m7gd.12xlarge;kafka_franzinput →snowflake_streamingoutput per node;brokerprimitive parallelises inputs/outputs within each node. - systems/snowflake — destination for the pipeline; single target table. Snowpipe Streaming API is the ingest surface; channels are the per-table parallelism unit.
- systems/aws-ec2 — substrate; both clusters on AWS EC2.
- systems/prometheus — pipeline metrics collection
(
redpanda-connect/components/metrics). - systems/grafana — metrics visualisation during the benchmark.
Concepts extracted¶
- concepts/snowpipe-streaming-channel — new concept. The
per-table parallelism unit of the Snowpipe Streaming API; max
10,000 per table; controlled via
channel_prefix×max_in_flightin the Redpanda Connectsnowflake_streamingconnector. Channels are the mechanism for exceeding the 10 GB/s documented per-table throughput ceiling. - concepts/build-parallelism-for-ingest-serialization —
new concept. Parallelism knob for the serialisation-and-
commit-preparation step in analytical-store streaming ingest
APIs. On Snowpipe Streaming's path, the
build_paralellismknob in thesnowflake_streamingconnector tunes how many threads prepare batches for the Snowpipe commit call. Rule-of-thumb: set to(available_cores − small_reserve). - concepts/batching-latency-tradeoff — reaffirmed: count-based vs byte-size-based batching triggers have different CPU overhead profiles; the count-based trigger is cheaper on the hot produce path.
- concepts/compression-codec-tradeoff — extended: binary encoding (AVRO) delivers ~20% higher broker throughput than text encoding (JSON) on randomised 1 KB payloads where codec compression ratios are comparable.
Patterns extracted¶
- patterns/count-over-bytesize-batch-trigger — new pattern. Prefer count-based batch-close triggers over byte-size-based triggers when the trigger evaluation is on the hot produce path and payload size is not yet computed. Byte-size enforcement requires per-message serialized-size knowledge; count enforcement is an integer increment. Canonical wiki instance: the 14.5 GB/s Snowpipe-Streaming benchmark.
- patterns/intra-node-parallelism-via-input-output-scaling —
new pattern. For data-pipeline platforms with a
broker-style input/output multiplexing primitive (Redpanda Connect, Benthos), the decisive throughput-scaling dimension is often not more nodes but more inputs/outputs per node to fully saturate the per-node CPU, network, and storage. Canonical instance: 12 Connect nodes × N-way intra-node parallelism delivered 14.5 GB/s to a single-table Snowflake sink. - patterns/binary-format-for-broker-throughput — new pattern. Choose binary encoding (AVRO, Protobuf) over text encoding (JSON) when raw broker throughput is the constraint. ~20% uplift at 1 KB payload size with randomised content; the delta is dominated by wire-payload size post-codec. Compositional with patterns/client-side-compression-over-broker-compression.
Caveats¶
- Vendor-voice benchmark. Redpanda-authored, unsigned, with
Snowflake as the analytical sink that Redpanda's
snowflake_streamingconnector targets. No independent reproduction disclosed at publication time. - Randomised 1 KB payload. The benchmark explicitly randomised message content "to mitigate the effects of compression." Real-world payloads compress; the disclosed 14.5 GB/s throughput is on post-compression wire bytes. Production workloads with highly-compressible JSON payloads may see different ratios.
- Public-internet transport. Redpanda Connect and Snowflake were on the public internet; the post notes AWS PrivateLink would reduce latency further, but the quantum of the reduction is not disclosed.
- Intra-node parallelism knob-values not disclosed. The
post names the
brokerprimitive as the decisive scaling dimension but does not disclose the specificbrokerinput/output fan-out factors used at the winning configuration. The GitHubpipeline.yamlis linked but not quoted in the post. - Single-table destination. All 3.8B messages landed in one Snowflake table. The Snowpipe-Streaming 10,000-channels-per-table ceiling is absolute; the test's 14.5 GB/s × 1 GB/s-per-channel estimate suggests channel count was near the ceiling but not disclosed.
- Latency decomposition depth. The 86%-in-Snowflake attribution is disclosed as a headline but not broken out per Snowpipe-Streaming phase (upload vs register vs commit). The underlying Snowpipe internals that make this the dominant latency are not walked in the post.
- Cost economics absent. Zero disclosure of: AWS compute cost for 21 nodes × run duration, Snowflake warehouse-credit cost for the ingest-facing warehouse, or cost-per-GB-ingested as a derived metric.
- Production-incident absent. Benchmark-run-only; the post does not report any production deployment of this architecture at this scale, nor sustained-over-time behaviour, nor operator-experience-under-failure.
build_paralellismspelling. The post uses the misspellingbuild_paralellism(twols, missingi) verbatim — this appears to be the actual Redpanda Connect connector parameter name at the time of writing, not a typo. Preserved verbatim for accurate config-reference.
Cross-source continuity¶
- Substrate follow-on to [[sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms|2025-06-24 AI-native-data-platforms]]. That post named Snowpipe-Streaming-as-analytical-landing-surface alongside Iceberg; this post discloses the throughput and latency profile of that landing surface at scale.
- Companion to [[sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1|2024-11-19 batch-tuning-part-1]] + [[sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2|2024-11-26 batch-tuning-part-2]]. The batch-tuning pair is producer→broker batching; this post is broker→analytical-sink batching. Both reaffirm concepts/batching-latency-tradeoff at different altitudes. The count-vs-byte-size trigger finding extends the pair by naming trigger-evaluation CPU cost as a previously undisclosed axis.
- Companion to [[sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda|2025-04-23 9-tips]]. The 9-tips post is a Redpanda-broker tuning checklist; this post is a Redpanda-Connect-sink tuning disclosure. Both names AVRO-over-JSON as a throughput uplift; this post quantifies it at ~20%.
Source¶
- Original: https://www.redpanda.com/blog/real-time-analytics-snowflake-streaming
- Raw markdown:
raw/redpanda/2025-10-02-real-time-analytics-at-scale-redpanda-and-snowflake-streamin-b639c064.md - Example pipeline YAML: github.com/redpanda-data-blog/real-time-analytics-snowflake-streaming
Related¶
- systems/redpanda
- systems/redpanda-connect
- systems/snowflake
- systems/aws-ec2
- systems/prometheus
- systems/grafana
- concepts/batching-latency-tradeoff
- concepts/snowpipe-streaming-channel
- concepts/build-parallelism-for-ingest-serialization
- concepts/compression-codec-tradeoff
- patterns/count-over-bytesize-batch-trigger
- patterns/intra-node-parallelism-via-input-output-scaling
- patterns/binary-format-for-broker-throughput
- companies/redpanda