There’s a surprising amount of engineering work that boils down to knowing how many times per second a user will tap their screen.
Let’s say you’re building a social media feed. A user scrolls through it. How many posts do they typically see in a single session? A reasonable estimate is 500 posts. Now, how often do they interact with a post (like, comment, share)? Let’s assume 10% of the time, so 50 interactions per session. If a user visits your app twice a day, that’s 100 posts and 10 interactions per user per day.
This scales quickly. If you have 100 million daily active users (DAU), that’s 10 billion posts viewed and 1 billion interactions generated every single day. This isn’t just about database load; it’s about network bandwidth, caching strategies, and the sheer volume of data you’re processing.
To make this concrete, let’s consider a simplified backend service that handles these interactions. For our 100 million DAU scenario, we need to process 1 billion interaction events daily. That’s roughly 11,500 events per second (1,000,000,000 events / 24 hours / 60 minutes / 60 seconds).
If each interaction event is a small JSON payload, say 1KB, you’re looking at 11.5 MB/s of incoming data just for interactions. This might seem manageable, but remember this is an average. Peak loads can be 2-3x higher, so you need to provision for ~30 MB/s.
What about read traffic? If users view 10 billion posts a day, and each post fetch involves some data (let’s say 10KB per post, including metadata and content), that’s 100 TB of data served daily. For a single user session, if they scroll 500 posts, that’s 5MB of data. Across 100 million users, that’s 500 TB of read data daily. This is where caching becomes paramount. A well-tuned CDN or in-memory cache can reduce the load on your origin servers dramatically.
Consider a simple API endpoint for fetching a post. A common pattern is GET /api/v1/posts/{post_id}. If your service is handling 10 billion views a day, and each view requires a database lookup (even with caching), you’re talking about an enormous number of read operations. If your cache hit rate is 99%, you still have 100 million database reads per day for posts. That’s about 1,150 reads per second. If each read takes 10ms, that’s 11.5 seconds of total database time per second, meaning you’d need around 12 database instances if each could handle 1000 reads/sec.
Let’s talk latency. For a user interacting with a post, they expect a quick confirmation. If your interaction service takes 50ms to process an event, and each user performs 10 interactions per day, that adds 500ms of processing time per user per day just for interactions. Over 100 million users, this is significant. If the backend processing time for a post view is 20ms, and they view 500 posts, that’s 10 seconds of backend processing per user per day for just fetching posts. This is why optimizing database queries, using efficient serialization formats like Protocol Buffers, and employing asynchronous processing (like message queues) are critical.
A crucial, often overlooked aspect of estimation is the "fan-out" problem. When a user posts an update, that update needs to be delivered to all their followers. If a user has 1 million followers, a single post can trigger 1 million individual "deliveries" or notifications. This is a write amplification problem. For a platform with 100 million users, if even 1% (1 million users) have over 1000 followers, a single popular post could generate millions of fan-out events. This is why many systems use a "push" model for popular users (pre-computing their feeds) and a "pull" model for less popular users.
When estimating storage, consider the lifecycle of data. A post might be stored for years. If you have 10 billion posts created daily, and each post is 100KB, that’s 1 TB of new data per day. Over a year, that’s 365 TB, plus metadata, user data, images, videos, etc. You need to factor in redundancy, backups, and potential data archiving or deletion policies.
The next logical step is to consider the implications of these numbers on your infrastructure costs.