A news feed’s primary job isn’t to show you everything, but to show you something you’ll care about, right now, efficiently.
Here’s a simplified look at how a popular social media feed might work, focusing on the core mechanics of getting posts from your friends to your screen.
Imagine Alice, Bob, and Charlie are friends. Alice posts a photo.
{
"post_id": "p123",
"user_id": "alice",
"content": "Beautiful sunset!",
"timestamp": "2023-10-27T10:00:00Z"
}
Fanout: Getting Posts Out There
When Alice posts, we need to make sure Bob and Charlie (and anyone else who follows Alice) can see it. This is called fanout. There are two main strategies:
-
Fanout on Write (Push Model): As soon as Alice posts, we immediately push that post to the inboxes of all her followers.
- Pros: Reading is super fast because posts are already waiting.
- Cons: If Alice has a million followers, she sends a million copies of her post. This can be computationally expensive and lead to "hot" users with massive fanout.
Let’s say Bob follows Alice. When Alice posts
p123, we might write to Bob’s feed store:// Bob's feed store (simplified) { "feed_id": "bob_feed", "posts": [ { "post_id": "p123", "user_id": "alice", "content": "Beautiful sunset!", "timestamp": "2023-10-27T10:00:00Z" }, // ... other posts Bob follows ] } -
Fanout on Read (Pull Model): When Bob requests his feed, we go fetch all the posts from everyone he follows and assemble it on the fly.
- Pros: Simple for the writer (Alice just writes one post). Good for users with very few followers.
- Cons: Reading can be slow, especially if Bob follows thousands of people. You have to do a lot of work every time someone asks for their feed.
When Bob requests his feed:
- We query posts from
alice. - We query posts from
bob(if he posts to his own feed). - We query posts from
charlie. - We merge and sort them by timestamp.
Most systems use a hybrid approach. For users with many followers, they might fanout on write. For users with few followers, they might fanout on read, or use a mix.
Ranking: What Do I See First?
Just getting posts isn’t enough; they need to be in an order that makes sense. This is where ranking comes in. A simple chronological sort is the baseline, but modern feeds use complex algorithms.
Let’s say Alice posts again at 10:05:00Z, and Bob posts at 10:02:00Z.
If Bob follows Alice, and we’re using a simple chronological sort, his feed might look like:
- Bob’s post (
10:02:00Z) - Alice’s sunset post (
10:00:00Z) - Alice’s new post (
10:05:00Z)
But what if Alice’s new post is a viral video and Bob’s post is a mundane update? The algorithm might reorder them. Factors include:
- Engagement: Likes, comments, shares on the post.
- Relationship: How often Bob interacts with Alice.
- Recency: Newer posts often get a boost.
- Content Type: Videos or photos might be prioritized.
- User Preferences: Explicitly followed topics or users.
The ranking algorithm takes the posts from the fanout stage and assigns a score to each, then sorts them by score. This score is a black box, but it’s essentially a weighted sum of many signals.
Pagination: Loading More
You don’t want to load thousands of posts at once. That would crash browsers and take forever. Pagination is how we load posts in chunks.
When Bob requests his feed, we give him the first 20 posts. This is "page 1." To get the next set, he requests "page 2."
The simplest way is offset-based pagination:
- Page 1:
SELECT * FROM feed WHERE user_id = 'bob_feed' ORDER BY timestamp DESC LIMIT 20 OFFSET 0; - Page 2:
SELECT * FROM feed WHERE user_id = 'bob_feed' ORDER BY timestamp DESC LIMIT 20 OFFSET 20;
The problem here is that if new posts are added or old ones deleted while the user is paginating, the results can be inconsistent (posts might be skipped or duplicated).
A more robust method is cursor-based pagination. Instead of an offset, you use a unique identifier (like the timestamp or a dedicated cursor ID) from the last item on the current page to fetch the next set.
If Bob sees Alice’s sunset post (p123) as the last item on page 1, page 2 would start fetching posts before p123 (or after, depending on sort order).
Example for page 2, assuming p123 has timestamp 2023-10-27T10:00:00Z:
SELECT * FROM feed WHERE user_id = 'bob_feed' AND timestamp < '2023-10-27T10:00:00Z' ORDER BY timestamp DESC LIMIT 20;
This ensures that even if new posts arrive, you won’t miss items.
The system makes a lot of trade-offs. For instance, a feed that prioritizes engagement over recency might show you a highly popular post from yesterday above a new, less popular one, which is a deliberate choice by the ranking algorithm to maximize user retention.