The Scenario
Picture this: It's the final over of a massive cricket match—think India vs. Pakistan. The tension is palpable, and millions of fans are glued to their screens. On the corner of your streaming application, a tiny number ticks upward: the live viewer count.
As a user, it's just a number. As an engineer, it's a terrifying distributed systems problem. Handling hundreds of thousands of view events per second(RPS) without losing a single count or taking down your database is a massive challenge. Let's grab a coffee and talk about how to engineer a high-throughput view-counting service in Go that won't break a sweat at 100,000 RPS.
The Struggle: Why the “Simple” Way Fails
When you first build a counter, the naive approach is to lean on your database. You write an endpoint that catches a view event and runs a simple query:
UPDATE counters SET count = count + 1 WHERE id = 'csk-mi';In a local testing environment, this works beautifully. But in production? It falls apart fast.
At even 1,000 RPS, all of your incoming requests are contending for the exact same row in Postgres. The database has to apply a row-level lock to ensure ACID compliance. Suddenly, that single lock becomes a massive bottleneck. Your CPU spikes, disk I/O thrashes, latency skyrockets to multiple seconds, and eventually, the database connections max out, bringing down the entire application.
To survive Cricket World Cup or IPL-scale traffic, we have to rethink the architecture entirely. We need to eliminate database contention by decoupling the write path.
The Solution: Three Layers of Decoupling
To solve this, we implement a robust pipeline using Go, Kafka, and Postgres. Here is how we break the problem down.
1. Fire and Forget (The Producer to Kafka)
First, our HTTP handlers must never block on I/O. When a user loads the match, the Go server catches the request, serializes the payload, and immediately drops it into an in-memory channel. Background goroutines quickly drain this queue, batching 10,000 messages at a time, and flush them to a Kafka topic partitioned 12 ways.
The client gets a 200 OKin microseconds. If there's a sudden traffic spike and our internal queue fills up, we immediately return a 500 error rather than leaking goroutines or silently dropping data.
2. In-Memory Sharded Aggregation
Now we have our view events safely sitting in Kafka. Our consumer workers pick them up, but we still don't write directly to the database. Instead, we aggregate them in RAM.
To prevent our own Go consumer threads from locking each other, we shard the in-memory counter into 64 distinct shards, keyed by an FNV-32a hash of the match_id.
Here is how clean that lock-free hot path looks in Go:
// internal/consumer/worker.go
func (w *Worker) incrementView(matchID string) {
s := w.getShard(matchID) // Hash-based sharding
s.Lock()
counter, ok := s.counts[matchID]
if !ok {
counter = &atomic.Uint64{}
s.counts[matchID] = counter
}
s.Unlock()
counter.Add(1) // lock-free atomic operation after the first write!
}By leveraging atomic.Uint64, the vast majority of our increments require absolutely no mutex locks.
3. Slotted Postgres Tables (The Magic Trick)
Once a second, we flush our aggregated counts to Postgres using a bulk UPSERT. But remember our original problem? If 12 consumer replicas all try to update the csk-mi row at the same time, we hit the exact same row-level lock contention.
The solution is a slotted database design. We change the primary key from just match_id to a composite key: (match_id, slot).
CREATE TABLE match_view_counts (
match_id TEXT NOT NULL,
slot INT NOT NULL,
view_count BIGINT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (match_id, slot)
);When our consumer flushes data, it picks a random slot number (0 to 63). Because our replicas randomly distribute their flushes across 64 different rows for the same match, concurrent writes almost never collide. We've completely eliminated hot-row lock contention!
When we need to display the final view count to a user, it's a lightweight SUM over those 64 slots:
SELECT COALESCE(SUM(view_count), 0) FROM match_view_counts WHERE match_id = $1Protecting the Read Path
To tie it all together, we protect that SUM query with a multi-layered read cache. Edge requests hit NGINX(which caches for 1 minute), falling back to Redis(kept fresh by our consumer after every DB flush). If Redis misses, we use Go's singleflight package to ensure that even if 10,000 usersask for the count at the exact same millisecond, only one query actually hits Postgres.
Real-World Application
This pattern isn't just for cricket matches. The combination of Kafka buffering, in-memory atomic sharding, and slotted database rows is a foundational distributed systems pattern. You'll see this exact architecture used to build real-time voting systems (like Eurovision or American Idol), global “Like” counters on social media platforms, and high-velocity financial analytics engines.
Whenever you have massive write contention on a single entity, remember: decouple, batch, and shard.