PATTERN Cited by 1 source

Async-buffered Kafka produce¶

Shape¶

Writes from the hot request path land in an in-process buffer (a Go channel, a JVM blocking queue, etc.) and return immediately. A background worker (goroutine, thread, async task) drains the buffer and flushes to Kafka on whichever comes first of:

A time-based flush interval (Hotstar: 500 ms).
A size-based batch ceiling (Hotstar: 20,000 messages).

HTTP handler ──► chan msg ──► [goroutine producer] ──► Kafka broker
  (returns 200                  │ flush every
   immediately)                 │   500 ms
                                │ OR every 20K msgs

Why it's a pattern¶

Kafka's synchronous acks=all write takes a round-trip-plus-fsync on the broker leader plus in-sync-replica acks. For latency-sensitive ingest paths (single-digit-ms user-facing budgets), this is too slow at the tail. Pushing the broker write off-path is the standard latency fix — but the knobs have to be named explicitly, because flushing too late loses data on crash and flushing too eagerly defeats batching and burns broker round-trips.

The Hotstar numbers 500 ms and 20,000 messages are a reusable template: small enough that a crash loses < half a second of submissions, large enough that the Kafka producer gets real batches on the wire.

Library support¶

The pattern is mostly built into mature Kafka client libraries:

confluent-kafka-go (librdkafka wrapper) — linger.ms + batch.num.messages + internal producer queue.
Sarama (Shopify) — Producer.Flush.Frequency + Producer.Flush.MaxMessages.
JVM KafkaProducer — linger.ms + batch.size. The producer queue is inside the client; the application still chooses whether to wait for the send().get() future.

The pattern question "do I wait for the returned future?" is equivalent to "do I take the sync or async fork?" — the library already implements the buffer; the application decides semantics.

Constants¶

Generalised from Hotstar + other public references:

Knob	Typical	Sets
`linger.ms` / flush interval	10 ms – 1 s (Hotstar: `500 ms`)	Worst-case latency at low traffic
`batch.size` / max msgs per flush	16 KB – 20 K msgs (Hotstar: `20,000`)	Per-request amortization; memory ceiling
Buffer capacity	Set to absorb N × flush-interval of peak traffic	Overflow behaviour (drop vs block)

Failure modes¶

At-most-once: up to min(flush_interval, buffer_size) records lost on crash. Document this explicitly; for vote-counting or billing use a sync path or a disk-backed WAL instead.
Producer backpressure: if the broker falls behind, the in-process buffer fills. Choose: drop (at-most-once, explicit metric), block (back-pressure the HTTP handler — defeats the purpose), or spill to disk (upgrade to at-least-once).
Silent drops on channel overflow: Go channels don't notify senders when full unless the send is a select with default. Instrument a dropped-message counter.

Seen in¶

sources/2024-03-26-highscalability-capturing-a-billion-emojions — Hotstar emoji-swarm ingest. Goroutine + channel + Confluent / Sarama client, 500 ms flush interval, 20,000 message ceiling. Accepted at-most-once semantics explicitly ("we need very low latency and data loss in rare scenarios is not a big concern"). Canonical wiki instance with named constants.

systems/kafka — the downstream broker.
concepts/async-write-buffer — the generic concept this pattern instantiates against Kafka specifically.
concepts/at-most-once-delivery — the delivery semantics this pattern buys latency with.
patterns/emoji-swarm-realtime-aggregation — end-to-end pipeline where this pattern is the ingest stage.