Skip to content

PATTERN Cited by 1 source

Async-buffered Kafka produce

Shape

Writes from the hot request path land in an in-process buffer (a Go channel, a JVM blocking queue, etc.) and return immediately. A background worker (goroutine, thread, async task) drains the buffer and flushes to Kafka on whichever comes first of:

  • A time-based flush interval (Hotstar: 500 ms).
  • A size-based batch ceiling (Hotstar: 20,000 messages).
HTTP handler ──► chan msg ──► [goroutine producer] ──► Kafka broker
  (returns 200                  │ flush every
   immediately)                 │   500 ms
                                │ OR every 20K msgs

Why it's a pattern

Kafka's synchronous acks=all write takes a round-trip-plus-fsync on the broker leader plus in-sync-replica acks. For latency-sensitive ingest paths (single-digit-ms user-facing budgets), this is too slow at the tail. Pushing the broker write off-path is the standard latency fix — but the knobs have to be named explicitly, because flushing too late loses data on crash and flushing too eagerly defeats batching and burns broker round-trips.

The Hotstar numbers 500 ms and 20,000 messages are a reusable template: small enough that a crash loses < half a second of submissions, large enough that the Kafka producer gets real batches on the wire.

Library support

The pattern is mostly built into mature Kafka client libraries:

  • confluent-kafka-go (librdkafka wrapper) — linger.ms + batch.num.messages + internal producer queue.
  • Sarama (Shopify) — Producer.Flush.Frequency + Producer.Flush.MaxMessages.
  • JVM KafkaProducerlinger.ms + batch.size. The producer queue is inside the client; the application still chooses whether to wait for the send().get() future.

The pattern question "do I wait for the returned future?" is equivalent to "do I take the sync or async fork?" — the library already implements the buffer; the application decides semantics.

Constants

Generalised from Hotstar + other public references:

Knob Typical Sets
linger.ms / flush interval 10 ms – 1 s (Hotstar: 500 ms) Worst-case latency at low traffic
batch.size / max msgs per flush 16 KB – 20 K msgs (Hotstar: 20,000) Per-request amortization; memory ceiling
Buffer capacity Set to absorb N × flush-interval of peak traffic Overflow behaviour (drop vs block)

Failure modes

  • At-most-once: up to min(flush_interval, buffer_size) records lost on crash. Document this explicitly; for vote-counting or billing use a sync path or a disk-backed WAL instead.
  • Producer backpressure: if the broker falls behind, the in-process buffer fills. Choose: drop (at-most-once, explicit metric), block (back-pressure the HTTP handler — defeats the purpose), or spill to disk (upgrade to at-least-once).
  • Silent drops on channel overflow: Go channels don't notify senders when full unless the send is a select with default. Instrument a dropped-message counter.

Seen in

  • sources/2024-03-26-highscalability-capturing-a-billion-emojions — Hotstar emoji-swarm ingest. Goroutine + channel + Confluent / Sarama client, 500 ms flush interval, 20,000 message ceiling. Accepted at-most-once semantics explicitly ("we need very low latency and data loss in rare scenarios is not a big concern"). Canonical wiki instance with named constants.
Last updated · 517 distilled / 1,221 read