PATTERN Cited by 1 source
Emoji-swarm real-time aggregation¶
Shape¶
Four-stage pipeline that turns a firehose of per-user clicks into a single low-frequency broadcast mood signal:
clients ──HTTP──► Go ingest ──chan──► [async flush] ──► Kafka (raw)
│
▼
Spark Streaming
(N-second micro-batch
aggregate: count/type)
│
▼
Kafka (aggregates)
│
▼
consumer ──► normalise
──► top-N
│
▼
WebSocket / PubSub fanout
│
▼
every client animates
Why a pattern, not a one-off¶
Hotstar originally built this for emoji swarms during cricket matches. They later reframed it as "process quantifiable user responses in near real-time" and reused the same infrastructure for Voting (Bigg Boss, Dance Plus), with Polls and Trivia slotting in on the same platform. The abstraction that survives is the pipeline shape, not the UI widget.
"Emojis and Voting have a common problem statement — Process quantifiable user responses in near real-time." (Source: sources/2024-03-26-highscalability-capturing-a-billion-emojions)
Stage-by-stage roles¶
-
Stateless HTTP ingest — Go service, async Kafka producer (see patterns/async-buffered-kafka-produce). Acknowledges client before broker write completes. Buys latency with at-most-once semantics.
-
Durable message bus — Kafka (under Hotstar's Knol platform). Raw submissions partitioned for parallel consumption; retention long enough to absorb a consumer pause.
-
Micro-batch aggregator — Spark Streaming (or Flink / Kafka Streams) computes per-type counts over a fixed window. Window length matches the downstream animation cadence (Hotstar: 2 s); any shorter is wasted freshness, any longer is perceptibly laggy. Output goes to a second Kafka topic.
-
Top-N + fanout — consumer reads aggregates, trims to the top-N most-popular types, pushes to a WebSocket-class fanout tier (Hotstar PubSub, up to 50M concurrent sockets). Trimming at the consumer bounds per-socket payload regardless of long-tail shape.
Design invariants¶
- Ingest cadence is decoupled from delivery cadence. Each hop has its own flush / batch / fanout rhythm. A 100 ms Spark GC pause doesn't stall HTTP ingest because HTTP never waited on Spark synchronously.
- UX cadence drives the aggregation window. If the animation refreshes every 2 s, the aggregate window is 2 s — not 200 ms, not 10 s. Don't over-spec freshness the UI can't display.
- Top-N is a presentation AND a cost decision. Trimming at the consumer before fanout caps per-socket bandwidth at a constant regardless of how long the tail of submitted types grows.
- Raw topic for analytics, aggregate topic for animation. Which emojis were submitted vs which were shown are different questions. Keep both topics.
Generalisations this pattern covers¶
Any "many-to-one condensation, broadcast to many" real-time interaction:
- Emoji swarms (Hotstar, Twitch chat "hype trains").
- Live voting (reality TV, sports audience polls).
- Reaction counts on live streams.
- Real-time polling in webinars.
- Sentiment heatmaps over live-broadcast chat.
The aggregation function changes (count per type, count per candidate, mean sentiment). The pipeline shape doesn't.
Caveats¶
- At-most-once ingest is fine for emojis, unacceptable for billing, borderline for votes. Hotstar doesn't disclose whether Voting uses the same async path.
- Micro-batch latency floor is the window length. Sub-window signal shifts cannot be observed by clients.
- Top-N hides the long tail. Operators who need "what did users actually submit?" must consume the raw topic separately.
- Spark vs Flink choice is workload-dependent. Hotstar picked Spark on "better community support" in 2019; modern Flink has parity + stronger event-time semantics. The pattern does not depend on the specific processor.
Seen in¶
- sources/2024-03-26-highscalability-capturing-a-billion-emojions — Hotstar Emojis + Voting. ~5 B emojis / 55.83 M users (ICC Cricket World Cup 2019), >6.5 B lifetime, ~3 B votes on the same platform. Canonical wiki instance of the pattern, with all four stages documented + named constants (500 ms / 20 K / 2 s / 50 M).
Related¶
- systems/hotstar-emojis — the canonical production instance.
- systems/kafka — the bus between every stage.
- systems/spark-streaming — the micro-batch aggregator.
- systems/hotstar-pubsub — the fanout tier.
- concepts/micro-batching — the aggregator's processing model.
- concepts/async-write-buffer — the ingest-stage primitive.
- concepts/fanout-and-cycle — the fanout-tier shape.
- patterns/async-buffered-kafka-produce — the ingest-stage implementation pattern.