PATTERN Cited by 2 sources

Binary format for broker throughput¶

Pattern¶

When raw streaming-broker throughput is the constraint, prefer binary encodings (AVRO, Protobuf) over text encodings (JSON) for on-wire payloads. The wire-size reduction translates directly to broker-throughput uplift because the broker's per-record cost is partly fixed and partly proportional to bytes-on-wire; smaller records move the variable component down.

Canonical production quantification: ~20% throughput uplift from AVRO vs JSON at 1 KB payload, randomised content (designed to neutralise compression effects). Disclosed in Redpanda's 14.5 GB/s Redpanda → Snowflake benchmark.

"Using a binary format like AVRO showed ~20% throughput improvement over a textual format like JSON." (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)

Why it works¶

Two layered effects compound:

Smaller uncompressed wire size. JSON has structural overhead — field names repeated per record, whitespace, delimiter characters — that binary encodings lack. AVRO records carry no inline field names; schema is shipped separately via a schema registry.
Higher effective compression ratio. Even when compression is enabled, binary encodings typically compress to smaller final sizes than the equivalent text payload. On randomised content (as in the benchmark — designed to defeat compression effects), the uncompressed-size delta dominates.

The benchmark's randomised-content methodology isolates the encoding-format effect from compression-effectiveness variation that would otherwise confound the comparison on real-world payloads.

Composes with client-side compression¶

The 20% uplift is a standalone effect from the encoding format. Composing with client-side compression — where the producer compresses the batch before wire transit, so the broker sees already-compressed bytes — is compositional and delivers further throughput gains.

In practice, production Kafka/Redpanda deployments typically pick both: binary encoding (AVRO or Protobuf) and client-side compression (lz4, zstd).

Trade-offs¶

Schema registry dependency. AVRO records don't carry field names inline, so the consumer needs the schema to decode. This requires a schema registry or equivalent out-of-band schema-distribution mechanism.
Debuggability. A JSON record is human-readable on console; AVRO is not. Debug-time instrumentation (decoder tooling, schema lookups) is required.
Schema-evolution discipline. Binary formats have stricter rules for schema changes (AVRO: name-preserving field adds with defaults; Protobuf: tag-number preservation). Text formats are more forgiving but break silently.
Codec compatibility with destination. Not all sinks natively consume the broker's binary format — e.g., the Snowpipe-Streaming destination may decode AVRO in the ingest connector rather than pass-through. The throughput gain is on the producer → broker → consumer hops; destination-side decoding is additional work.

The AVRO-over-JSON finding was one of four tuning insights from the 14.5 GB/s Redpanda → Snowflake run:

~20% uplift from AVRO — this pattern.
Count-based batching over byte-size — less trigger-evaluation CPU overhead.
build_paralellism tuned to (cores − small reserve) — Snowpipe-Streaming commit-path serialisation bottleneck.
Channel-count scaling via channel_prefix × max_in_flight — exceeding the documented 10 GB/s per-table ceiling.

Seen in¶

sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming — canonical wiki quantification of the AVRO-over-JSON throughput effect at ~20% on 1 KB randomised payloads, from Redpanda's 14.5 GB/s Snowflake benchmark.
sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — Redpanda 9-tips checklist also recommends binary encodings for raw broker throughput; the 2025-10-02 benchmark quantifies the previously-unquantified tip.

systems/redpanda, systems/kafka — streaming brokers where wire-payload size determines raw throughput.
concepts/compression-codec-tradeoff — the adjacent codec-choice axis; binary encoding and codec compression compose.
concepts/schema-registry — the out-of-band schema-distribution mechanism that makes AVRO viable.
concepts/effective-batch-size — smaller per-record wire size means more records per batch at fixed batch.size.
patterns/batch-over-network-to-broker — the producer- side batching pattern this composes with.
patterns/client-side-compression-over-broker-compression — further-downstream compression choice that composes with encoding format.