HIGHSCALABILITY 2024-03-26 Tier 1

High Scalability — Capturing a billion emo(j)i-ons¶

Summary¶

Dedeepya Bonthu's repost (from her Medium original) describes how Hotstar (India's live-sports OTT platform, later merged into JioCinema) built the in-house real-time emoji swarm feature that powered its Social Feed during the ICC Cricket World Cup 2019 — ~5 billion emojis from 55.83 million users during that tournament, >6.5 billion lifetime at the time of writing. The system replaces a third-party vendor that was too slow, too unstable, and too expensive to sustain at cricket-final concurrency. The final architecture is a canonical instance of three-stage real-time aggregation: (1) Golang HTTP ingest writes user-submitted emoji events to a local in-process channel, with a background goroutine-backed producer flushing to Kafka every 500 ms or every 20,000 messages — whichever comes first; (2) a Spark Streaming job consumes the Kafka topic and computes 2-second micro-batched aggregates (emoji counts per interval), then writes the aggregate back to a second Kafka topic; (3) a Python Kafka consumer normalises, picks the top-N most popular emojis, and pushes them to Hotstar PubSub (the in-house 50M-concurrent-WebSocket fanout service) which animates the swarm on clients. Hotstar explicitly accepted data loss in rare scenarios as the price of low-latency asynchronous ingest (though they report observing none). The same system was later generalised to power Voting for Bigg Boss (Telugu/Tamil/Malayalam) and Dance Plus (≥3 billion votes to date), reframed as "process quantifiable user responses in near real-time."

Key takeaways¶

Problem shape: quantifiable user responses, condensed into a swarm, delivered near-real-time to every viewer. Cricket fans watching at home want the equivalent of placard-waving at the stadium. Collect raw emoji submissions from millions of concurrent users, condense them into a single system-wide "mood" signal, and push the changing mood back to every client fast enough that it still feels reactive to the last delivery / ball / wicket. The hard part is the concurrency: cricket-final scale is billions of submissions over a single match window. "Collecting these user-generated signals in real-time, condensing these opinions to an emoji swarm that shows the mood of the audience and displaying the changing moods in real-time is challenging when you plan to receive billions of such emoji submissions during a tournament." The same problem shape subsumes Voting — the only change is the condensation function (count per candidate instead of top-N emoji). (Source: article §intro + §Voting and more.)
Three design principles, each mapped to a concrete mechanism. Hotstar names three: scalability (horizontal scaling behind load balancers + autoscaling), decomposition (HTTP ingest, message queue, stream processor, delivery are separate components, each independently scalable), and asynchronous processing ("execution without blocking resources and thus supports higher concurrency"). The last is the central design lever — it's what lets the HTTP ingest layer acknowledge a submission in the time it takes to push to a local channel, not the time it takes for Kafka to fsync a record. Every subsequent choice (Go channels, periodic batch flush, Spark micro-batches, PubSub push) follows from this.
Kafka chosen over Flink/Storm/Kafka-Streams and over raw Kafka DIY, via an in-house "Knol" platform. Requirements ranked by Hotstar: "high throughput, availability, low latency and supports consumer groups." Kafka satisfies all four; managing Kafka in-house does not scale operationally at Hotstar's concurrency profile. They therefore consume Kafka through Knol — "an amazing data platform [...] built on top of Kafka" — which hides cluster-ops from product teams. This is the canonical wiki instance of the managed Kafka platform as an internal product pattern. (Source: How are client requests handled? + Knol citation.)
Write path: synchronous-ack vs async-buffered is the first decision. Hotstar explicitly enumerates the two options. Synchronous: wait for Kafka broker ack before returning 200; configure producer+server retries; "If your data is transactional or cannot suffer any loss, this approach is preferable." Asynchronous: write to an in-process buffer and return 200 immediately; a background flusher writes to Kafka; "the downside is that if not handled properly, this could result in data loss." For emoji "we need very low latency and data loss in rare scenarios is not a big concern (although we haven't seen any so far)" — so they chose async. This framing is the canonical real-world instance of the at-most-once ingest trade-off: you buy latency with recoverability. (Source: How do we write messages to the queue?)
Golang goroutines + channels as the async-write primitive — with exact production flush constants. "Goroutines are lightweight threads that can execute functions asynchronously. We just need to say go do_something(). We use Goroutines and Channels in Golang to write messages to Kafka. Messages to be produced are written to a Channel. A Producer runs in the background as a Goroutine and flushes the data periodically to Kafka." They use the Confluent or Sarama Kafka-client libraries and configure: flush interval = 500 ms and max messages per batch = 20,000. Whichever of the two triggers first causes a flush. This is a precise, numbered version of the more generic async-buffered producer pattern — reusable as a template for any Kafka ingress where latency >> durability at the tail.
Read path: Spark Streaming picked over Flink / Storm / Kafka Streams on two criteria. "After considering different streaming frameworks like Flink, Spark, Storm, Kafka Streams we decided to go with Spark. Spark has support for micro batching and aggregations which are essential for our use case and better community support compared to competitors like Flink." The reasoning is instructive: micro-batching is not a downside for this workload — it's the feature. The aggregation function is "count each emoji type over the last 2 seconds" — a natural fit for discrete batch windows, and the 2-second window is already the UX cadence at which the swarm updates. Continuous-event-processing (Flink-style) would add complexity without adding animation-relevant freshness. Production batch window: 2 seconds. Output: aggregate counts written to a second Kafka topic. (Source: How does the processing happen?)
Delivery path: Python Kafka consumer → normalise → top-N → PubSub WebSocket fanout. A Python Kafka consumer reads the aggregate topic at the Spark-configured cadence ("if it's set to 1 second, this consumer would receive a message per second"). It normalises the aggregate and pushes the top ("relatively more popular") emojis to PubSub — "a real-time messaging infrastructure built at Hotstar to deliver messages to users in our Social Feed" — which fans out to all connected clients over a WebSocket-class channel. PubSub is the "[50M concurrent socket connections" service described in a separate Hotstar post linked from the article. Trimming to top-N at this stage is both a presentation decision (there are only so many emojis the client can animate meaningfully) and a fanout-cost decision — the per-socket payload is bounded to the top-N, not the long-tail count vector.
Delivery cadence decouples from ingest cadence. Ingress: user clicks in fractions-of-a-second bursts. Producer flush: every 500 ms or 20,000 messages. Kafka → Spark: continuous stream read. Aggregation window: 2 s. PubSub push cadence: configured per Spark batch duration (1-2 s). Client animation: top-N emoji swarm per message. Each hop absorbs variance without tight coupling to the next — the classic decomposition pattern. A 100 ms Spark GC pause or a PubSub socket slow-drain doesn't stall HTTP ingest, because HTTP ingest never synchronously waited for any of them.
Operational numbers. ~5 billion emojis / 55.83 million users during ICC Cricket World Cup 2019 (single tournament). >6.5 billion emojis lifetime at publication time. ~3 billion votes processed on the same infrastructure (Voting feature). No specific tail-latency, HTTP throughput, or broker/consumer lag figures are disclosed — the post is an architecture narrative, not a performance report.
Generalisation: same platform for Emojis, Voting, Polls, Trivia. The article's framing abstraction at the end is key: "Emojis and Voting have a common problem statement — Process quantifiable user responses in near real-time." Voting (Dance Plus, Bigg Boss Telugu/Tamil/Malayalam) is literally the same pipeline with a different aggregation function (per-candidate count) and a different delivery endpoint (ballot result, not animated emoji). This reframing — from "the emoji feature" to "the real-time quantifiable response platform" — is what makes the in-house rebuild a capability rather than a single feature. Polls and Trivia slot in "out of the box" on the same infrastructure. This generalisation lesson (build the verb, not the noun) is the most transferable takeaway of the post.
Caveats of the chosen architecture (implicit in the design).
- At-most-once ingest. If the goroutine or the process dies between channel-write and broker-flush, those up-to-min(500 ms, 20K messages) events are lost. Acceptable for emoji swarms, unacceptable for Voting — Hotstar doesn't disclose whether Voting uses the same async path or a synchronous one.
- Micro-batch latency floor. The 2-second Spark window is an end-to-end latency floor: no signal shift shorter than ~2-3 s can be observed at the client. For the swarm UX this is fine; for latency-sensitive interactions it isn't.
- Top-N truncation. Long-tail emojis are discarded at the delivery layer. Analytics (which emojis were submitted, not which were shown) must be computed off the raw Kafka topic, not off the PubSub feed.
- WebSocket-class fanout at 50M concurrency is a separate engineering problem — see the companion Building PubSub for 50M Concurrent Socket Connections post referenced in the article.
- Spark vs Flink choice is dated. Written in 2024 but describing a 2019-era decision; modern Flink has community parity and better watermark semantics for event-time windowing — the Spark-over-Flink argument ("better community support") would not pass the 2025 smell test. What remains valid is "micro-batching is a fit when your UX cadence is already discrete."

Architecture snapshot¶

  client   ──HTTP──▶   Go ingest svc   ──chan──▶   in-process buffer
                                                      │
                                        (every 500 ms │ or 20K msgs)
                                                      ▼
                           Knol / Kafka (raw emoji topic)
                                                      │
                                                      ▼
                      Spark Streaming (2 s micro-batch aggregate)
                                                      │
                                                      ▼
                        Kafka (aggregate emoji-counts topic)
                                                      │
                                                      ▼
                       Python consumer → normalise → top-N
                                                      │
                                                      ▼
             Hotstar PubSub (WebSocket fanout, 50M concurrent)
                                                      │
                                                      ▼
                                         client animates emoji swarm

Operational numbers¶

5 B emojis / 55.83 M users — ICC Cricket World Cup 2019 (single tournament).
>6.5 B emojis lifetime.
~3 B votes — same platform, generalised to Bigg Boss / Dance Plus Voting.
flush.interval = 500 ms — Kafka producer client-side flush cadence.
max.messages.per.flush = 20,000 — Kafka producer per-request batch ceiling.
2-second Spark Streaming micro-batch window.
1-second consumer cadence at one Spark configuration (tunable).
50M concurrent socket connections on the PubSub delivery tier (per companion post).

Caveats¶

Tier-1 source (High Scalability republishing Medium), but the post is a mid-level architecture narrative, not an internals paper — no code, no cluster topology, no tail-latency data, no failure-mode chapter. Claims about per-component behaviour are at the "design-intent" level; production-numbers list above is what the article explicitly discloses and nothing more.
Author is a Hotstar engineer but the piece is marketing-adjacent ("Want to work with us? Check out [tech.hotstar.com]..."); treat operational numbers as directional marketing figures, not audited benchmarks.
Tech-stack specifics are period-specific (pre-2024). Modern equivalents (Flink > Spark Streaming for sub-second event-time windows; Redpanda / Warpstream for Kafka-compatible cost; gRPC streaming or QUIC/WebTransport for client fanout in place of a custom WebSocket gateway) would all be reasonable alternatives today; the article does not re-evaluate.
No mention of exactly-once. The async-buffered write path is at-most-once; no transactional producer, no idempotent write-path protocol. Fine for emojis, and Hotstar says "we haven't seen" loss — but claiming that at 5 B events is extrapolation, not measurement.
"PubSub" is Hotstar's in-house product, not Google Pub/Sub. Don't confuse with Google Cloud Pub/Sub.

Source¶

Original: https://highscalability.com/capturing-a-billion-emo-j-i-ons/
Raw markdown: raw/highscalability/2024-03-26-capturing-a-billion-emoji-ons-6d3afb5c.md

systems/hotstar-emojis — the system this post documents.
systems/hotstar-knol — Hotstar's internal managed-Kafka data platform.
systems/hotstar-pubsub — Hotstar's 50M-concurrent-socket real-time delivery tier.
systems/spark-streaming — the micro-batch stream-processor chosen here over Flink/Storm/Kafka-Streams.
systems/kafka — the message broker under Knol; Hotstar's ingest buffer.
concepts/micro-batching — the "aggregate over short fixed windows" processing model.
concepts/async-write-buffer — the local-channel-then-flush primitive.
concepts/at-most-once-delivery — the ingest semantics this design pays for low latency.
concepts/streaming-aggregation — the larger family the Spark stage sits in.
concepts/fanout-and-cycle — the WebSocket delivery tier's shape.
patterns/async-buffered-kafka-produce — the goroutine+channel producer pattern with the 500ms / 20K-msg knobs named.
patterns/emoji-swarm-realtime-aggregation — the end-to-end "HTTP → queue → micro-batch aggregate → top-N → fanout" pipeline shape.
companies/highscalability — source aggregator.