Skip to content

REDPANDA 2025-10-02

Read original ↗

Redpanda — Real-time analytics at scale: Redpanda and Snowflake Streaming

Summary

Unsigned Redpanda benchmark post disclosing a large-scale performance test of a Redpanda → Snowflake streaming pipeline built with Redpanda Connect's snowflake_streaming output connector. A 9-node Redpanda Enterprise cluster (AWS EC2 m7gd.16xlarge) fed 12 Redpanda Connect nodes (m7gd.12xlarge), landing 3.8 billion 1 KB AVRO-encoded messages as rows in a single Snowflake table. Peak sustained throughput: 14.5 GB/s with P50 ≈ 2.18 s and P99 ≈ 7.49 s end-to-end latency. The 14.5 GB/s result exceeds the Snowflake-documented single-table ceiling of 10 GB/s by 45%. Disaggregated latency attribution: 86% of the 7.49 s P99 budget is in the Snowflake upload-then-register-then-commit step, not in the Redpanda read / transport / decode path — the Snowpipe Streaming commit protocol is the dominant contributor. Four tuning findings the post canonicalises for the wiki: (1) AVRO over JSON gives ~20% throughput uplift via smaller payload after compression; (2) batching by count beats batching by byte_size because byte_size enforcement requires per-message size calculation overhead; (3) build_paralellism should be tuned close to the machine core count with a small reserve — on 48-core instances the test used 40; (4) Snowpipe Streaming channels are the parallelism unit — controlled by channel_prefix × max_in_flight — with a hard ceiling of 10,000 channels per table. The post also names intra-node input/output scaling (running many parallel Connect pipelines within a single Connect process) as the decisive throughput multiplier, rather than adding more Connect nodes. Control-group result (Redpanda → drop sink) capped at 15.1 GB/s / P99 8.38 ms — setting an upper bound on what the pipeline could achieve without Snowflake in the loop.

Key takeaways

  1. 14.5 GB/s into a single Snowflake table beats the documented 10 GB/s per-table ceiling by 45%. Snowflake's public "Snowpipe Streaming high-throughput limitations" doc caps best single-table performance at 10 GB/s aggregate. The benchmark's winning configuration sustained 14.5 GB/s by scaling channel count via channel_prefix × max_in_flight and scaling intra-node parallelism via broker inputs/outputs on each Connect node. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

  2. 86% of the end-to-end P99 latency budget lives in the Snowflake upload/register/commit steps. Of the 7.49 s P99 end-to-end latency, ~6.44 s is attributable to the Snowpipe-Streaming commit path. The Redpanda → Connect → Snowflake-upload path contributes only ~1.05 s. This is the analytical-sink-as-bottleneck signature — the streaming broker is not the dominant latency contributor in a real-time-analytics pipeline landing in Snowflake. Public-internet transport; AWS PrivateLink would further reduce it. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming; see concepts/snowpipe-streaming-channel)

  3. AVRO produces ~20% higher throughput than JSON for the same logical 1 KB payload. The 3.8B-message dataset was randomised per-record to defeat compression. AVRO's binary encoding produces smaller on-wire representation than JSON's text encoding at equal information content; the broker throughput delta is the wire-payload-size delta post-codec. Canonical wiki instance of binary-over-text format choice for streaming-broker throughput. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming; patterns/binary-format-for-broker-throughput)

  4. Count-based batching outperforms byte-size-based batching because byte_size enforcement requires per-message size calculation whereas count is a simple integer compare. The finding is specifically about trigger-evaluation CPU cost: on a hot produce path, evaluating "have I reached N bytes?" requires the producer to have serialized size information for every incoming message before the batch-close decision, whereas "have I reached N messages?" is an increment-and- compare. Canonicalised as patterns/count-over-bytesize-batch-trigger. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

  5. build_paralellism should be tuned to (available_cores − reserved_cores). Snowpipe-Streaming build steps (serialisation + commit preparation) are a latency bottleneck in the snowflake_streaming connector. The connector exposes build_paralellism (note the spelling in the post) to run build in parallel threads. The benchmark's 48-core Connect nodes set build_paralellism: 40, reserving 8 cores for "other processes". Named as the concrete latency-reducing knob in the Snowpipe-Streaming commit path. Canonicalised as concepts/build-parallelism-for-ingest-serialization. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

  6. Snowpipe-Streaming channels max at 10,000 per table. Channels are the parallelism unit of Snowpipe Streaming; multiple channels write to the same table concurrently. Redpanda Connect exposes two knobs in combination: channel_prefix (namespace) × max_in_flight (channels per prefix). Hitting the 10,000-channel ceiling surfaces as "the Snowpipe API screaming at us" — an explicit failure mode the benchmark encountered on aggressive channel-scaling tests. Canonicalised as concepts/snowpipe-streaming-channel. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

  7. Intra-node input/output scaling — running many parallel pipelines within one Connect process — unlocked the headline throughput, not adding more Connect nodes. Redpanda Connect's broker input/output primitive parallelises inputs and outputs within a single node to "fully use the system's resources and boost performance." The post is explicit that this — rather than partition count alone — was the decisive scaling dimension: "unlocking massive throughput by scaling the number of inputs and outputs within a single node." Canonicalised as patterns/intra-node-parallelism-via-input-output-scaling. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

  8. Control-group (Redpanda → drop) ceiling: 15.1 GB/s at P99 8.38 ms in 5 minutes total. Decoding and reading all 3.8B messages to a null sink took 5 minutes; Snowflake added 1 minute of wall-clock overhead (6 minutes total) while costing most of the P99 budget (8.38 ms → 7.49 s). Useful as an architectural upper bound: the Redpanda + Connect substrate could sustain 15.1 GB/s × 1000 = ~15,100 ms of P99 budget left on the table, all of which went into the Snowflake commit path in the end-to-end test. (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

Operational numbers

Dimension Value
Redpanda cluster 9 × AWS EC2 m7gd.16xlarge
Redpanda Connect cluster 12 × AWS EC2 m7gd.12xlarge
Topic partition count 1,200
Messages streamed 3.8 billion
Payload size (exact) 1,000 bytes (1 KB)
Payload encoding AVRO (randomised)
Control-group (drop sink) throughput 15.1 GB/s
Control-group P99 latency 8.38 ms
Control-group run time 5 minutes total
Winning-config throughput 14.5 GB/s
Winning-config P50 latency 2.18 s
Winning-config P99 latency 7.49 s
Winning-config run time ~6 minutes total
Snowflake per-table documented ceiling 10 GB/s aggregate
Overage vs Snowflake doc +45%
Snowflake fraction of P99 86% (~6.44 s of 7.49 s)
build_paralellism setting 40 (on 48-core machines)
Snowpipe channel ceiling 10,000 per table
Connection Public internet (no PrivateLink)

Systems extracted

  • systems/redpanda — 9-node Enterprise cluster on AWS EC2 m7gd.16xlarge; 1,200-partition topic; deployed via official Terraform + Ansible automation.
  • systems/redpanda-connect — 12-node cluster on m7gd.12xlarge; kafka_franz input → snowflake_streaming output per node; broker primitive parallelises inputs/outputs within each node.
  • systems/snowflake — destination for the pipeline; single target table. Snowpipe Streaming API is the ingest surface; channels are the per-table parallelism unit.
  • systems/aws-ec2 — substrate; both clusters on AWS EC2.
  • systems/prometheus — pipeline metrics collection (redpanda-connect/components/metrics).
  • systems/grafana — metrics visualisation during the benchmark.

Concepts extracted

  • concepts/snowpipe-streaming-channelnew concept. The per-table parallelism unit of the Snowpipe Streaming API; max 10,000 per table; controlled via channel_prefix × max_in_flight in the Redpanda Connect snowflake_streaming connector. Channels are the mechanism for exceeding the 10 GB/s documented per-table throughput ceiling.
  • concepts/build-parallelism-for-ingest-serializationnew concept. Parallelism knob for the serialisation-and- commit-preparation step in analytical-store streaming ingest APIs. On Snowpipe Streaming's path, the build_paralellism knob in the snowflake_streaming connector tunes how many threads prepare batches for the Snowpipe commit call. Rule-of-thumb: set to (available_cores − small_reserve).
  • concepts/batching-latency-tradeoff — reaffirmed: count-based vs byte-size-based batching triggers have different CPU overhead profiles; the count-based trigger is cheaper on the hot produce path.
  • concepts/compression-codec-tradeoff — extended: binary encoding (AVRO) delivers ~20% higher broker throughput than text encoding (JSON) on randomised 1 KB payloads where codec compression ratios are comparable.

Patterns extracted

  • patterns/count-over-bytesize-batch-triggernew pattern. Prefer count-based batch-close triggers over byte-size-based triggers when the trigger evaluation is on the hot produce path and payload size is not yet computed. Byte-size enforcement requires per-message serialized-size knowledge; count enforcement is an integer increment. Canonical wiki instance: the 14.5 GB/s Snowpipe-Streaming benchmark.
  • patterns/intra-node-parallelism-via-input-output-scalingnew pattern. For data-pipeline platforms with a broker-style input/output multiplexing primitive (Redpanda Connect, Benthos), the decisive throughput-scaling dimension is often not more nodes but more inputs/outputs per node to fully saturate the per-node CPU, network, and storage. Canonical instance: 12 Connect nodes × N-way intra-node parallelism delivered 14.5 GB/s to a single-table Snowflake sink.
  • patterns/binary-format-for-broker-throughputnew pattern. Choose binary encoding (AVRO, Protobuf) over text encoding (JSON) when raw broker throughput is the constraint. ~20% uplift at 1 KB payload size with randomised content; the delta is dominated by wire-payload size post-codec. Compositional with patterns/client-side-compression-over-broker-compression.

Caveats

  • Vendor-voice benchmark. Redpanda-authored, unsigned, with Snowflake as the analytical sink that Redpanda's snowflake_streaming connector targets. No independent reproduction disclosed at publication time.
  • Randomised 1 KB payload. The benchmark explicitly randomised message content "to mitigate the effects of compression." Real-world payloads compress; the disclosed 14.5 GB/s throughput is on post-compression wire bytes. Production workloads with highly-compressible JSON payloads may see different ratios.
  • Public-internet transport. Redpanda Connect and Snowflake were on the public internet; the post notes AWS PrivateLink would reduce latency further, but the quantum of the reduction is not disclosed.
  • Intra-node parallelism knob-values not disclosed. The post names the broker primitive as the decisive scaling dimension but does not disclose the specific broker input/output fan-out factors used at the winning configuration. The GitHub pipeline.yaml is linked but not quoted in the post.
  • Single-table destination. All 3.8B messages landed in one Snowflake table. The Snowpipe-Streaming 10,000-channels-per-table ceiling is absolute; the test's 14.5 GB/s × 1 GB/s-per-channel estimate suggests channel count was near the ceiling but not disclosed.
  • Latency decomposition depth. The 86%-in-Snowflake attribution is disclosed as a headline but not broken out per Snowpipe-Streaming phase (upload vs register vs commit). The underlying Snowpipe internals that make this the dominant latency are not walked in the post.
  • Cost economics absent. Zero disclosure of: AWS compute cost for 21 nodes × run duration, Snowflake warehouse-credit cost for the ingest-facing warehouse, or cost-per-GB-ingested as a derived metric.
  • Production-incident absent. Benchmark-run-only; the post does not report any production deployment of this architecture at this scale, nor sustained-over-time behaviour, nor operator-experience-under-failure.
  • build_paralellism spelling. The post uses the misspelling build_paralellism (two ls, missing i) verbatim — this appears to be the actual Redpanda Connect connector parameter name at the time of writing, not a typo. Preserved verbatim for accurate config-reference.

Cross-source continuity

  • Substrate follow-on to [[sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms|2025-06-24 AI-native-data-platforms]]. That post named Snowpipe-Streaming-as-analytical-landing-surface alongside Iceberg; this post discloses the throughput and latency profile of that landing surface at scale.
  • Companion to [[sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1|2024-11-19 batch-tuning-part-1]] + [[sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2|2024-11-26 batch-tuning-part-2]]. The batch-tuning pair is producer→broker batching; this post is broker→analytical-sink batching. Both reaffirm concepts/batching-latency-tradeoff at different altitudes. The count-vs-byte-size trigger finding extends the pair by naming trigger-evaluation CPU cost as a previously undisclosed axis.
  • Companion to [[sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda|2025-04-23 9-tips]]. The 9-tips post is a Redpanda-broker tuning checklist; this post is a Redpanda-Connect-sink tuning disclosure. Both names AVRO-over-JSON as a throughput uplift; this post quantifies it at ~20%.

Source

Last updated · 470 distilled / 1,213 read