Skip to content

PATTERN Cited by 1 source

Emoji-swarm real-time aggregation

Shape

Four-stage pipeline that turns a firehose of per-user clicks into a single low-frequency broadcast mood signal:

clients ──HTTP──► Go ingest ──chan──► [async flush] ──► Kafka (raw)
                                            Spark Streaming
                                         (N-second micro-batch
                                          aggregate: count/type)
                                              Kafka (aggregates)
                                          consumer ──► normalise
                                                    ──► top-N
                                    WebSocket / PubSub fanout
                                              every client animates

Why a pattern, not a one-off

Hotstar originally built this for emoji swarms during cricket matches. They later reframed it as "process quantifiable user responses in near real-time" and reused the same infrastructure for Voting (Bigg Boss, Dance Plus), with Polls and Trivia slotting in on the same platform. The abstraction that survives is the pipeline shape, not the UI widget.

"Emojis and Voting have a common problem statement — Process quantifiable user responses in near real-time." (Source: sources/2024-03-26-highscalability-capturing-a-billion-emojions)

Stage-by-stage roles

  1. Stateless HTTP ingest — Go service, async Kafka producer (see patterns/async-buffered-kafka-produce). Acknowledges client before broker write completes. Buys latency with at-most-once semantics.

  2. Durable message bus — Kafka (under Hotstar's Knol platform). Raw submissions partitioned for parallel consumption; retention long enough to absorb a consumer pause.

  3. Micro-batch aggregatorSpark Streaming (or Flink / Kafka Streams) computes per-type counts over a fixed window. Window length matches the downstream animation cadence (Hotstar: 2 s); any shorter is wasted freshness, any longer is perceptibly laggy. Output goes to a second Kafka topic.

  4. Top-N + fanout — consumer reads aggregates, trims to the top-N most-popular types, pushes to a WebSocket-class fanout tier (Hotstar PubSub, up to 50M concurrent sockets). Trimming at the consumer bounds per-socket payload regardless of long-tail shape.

Design invariants

  • Ingest cadence is decoupled from delivery cadence. Each hop has its own flush / batch / fanout rhythm. A 100 ms Spark GC pause doesn't stall HTTP ingest because HTTP never waited on Spark synchronously.
  • UX cadence drives the aggregation window. If the animation refreshes every 2 s, the aggregate window is 2 s — not 200 ms, not 10 s. Don't over-spec freshness the UI can't display.
  • Top-N is a presentation AND a cost decision. Trimming at the consumer before fanout caps per-socket bandwidth at a constant regardless of how long the tail of submitted types grows.
  • Raw topic for analytics, aggregate topic for animation. Which emojis were submitted vs which were shown are different questions. Keep both topics.

Generalisations this pattern covers

Any "many-to-one condensation, broadcast to many" real-time interaction:

  • Emoji swarms (Hotstar, Twitch chat "hype trains").
  • Live voting (reality TV, sports audience polls).
  • Reaction counts on live streams.
  • Real-time polling in webinars.
  • Sentiment heatmaps over live-broadcast chat.

The aggregation function changes (count per type, count per candidate, mean sentiment). The pipeline shape doesn't.

Caveats

  • At-most-once ingest is fine for emojis, unacceptable for billing, borderline for votes. Hotstar doesn't disclose whether Voting uses the same async path.
  • Micro-batch latency floor is the window length. Sub-window signal shifts cannot be observed by clients.
  • Top-N hides the long tail. Operators who need "what did users actually submit?" must consume the raw topic separately.
  • Spark vs Flink choice is workload-dependent. Hotstar picked Spark on "better community support" in 2019; modern Flink has parity + stronger event-time semantics. The pattern does not depend on the specific processor.

Seen in

  • sources/2024-03-26-highscalability-capturing-a-billion-emojions — Hotstar Emojis + Voting. ~5 B emojis / 55.83 M users (ICC Cricket World Cup 2019), >6.5 B lifetime, ~3 B votes on the same platform. Canonical wiki instance of the pattern, with all four stages documented + named constants (500 ms / 20 K / 2 s / 50 M).
Last updated · 517 distilled / 1,221 read