Facebook Messenger is designed to be a highly available, low-latency messaging system that can handle billions of messages daily. It achieves this by breaking down the problem into several key components: chat, delivery, and storage.

Let’s see it in action. Imagine Alice sends a message to Bob.

Alice (Client) -> Messenger Service (API Gateway) -> Chat Service

The Chat Service receives the message, generates a unique ID, and timestamps it. Then, it needs to get that message to Bob. This is where the Delivery Service comes in.

Chat Service -> Delivery Service -> Bob (Client)

The Delivery Service is responsible for figuring out how to deliver the message. If Bob is online, it might push the message directly to his device via a persistent WebSocket connection. If Bob is offline, it queues the message for later delivery and might send a push notification to his phone.

Finally, every message needs to be stored.

Chat Service -> Storage Service

The Storage Service persists the message for history, retrieval, and offline access. This is often a distributed database optimized for fast writes and reads.

The core problem Messenger solves is reliably delivering a message from sender to recipient in near real-time, even at massive scale, while also ensuring that message history is always available. It does this by decoupling the concerns of message creation, real-time delivery, and persistent storage into distinct services.

The Chat Service acts as the entry point for new messages. It’s responsible for basic message processing, like assigning a unique ID and timestamp. Its primary job is to hand off the message to the next stage.

The Delivery Service is the heart of real-time communication. It maintains connections with active users and manages message queues for offline users. It decides whether to push a message directly or rely on push notifications. This service needs to be incredibly fast and resilient.

The Storage Service ensures message durability. It uses distributed databases like Cassandra or a custom-built solution to store billions of messages. Data is sharded and replicated across many servers to guarantee availability and prevent data loss.

Here’s a bit of config you might see in the Delivery Service for managing WebSocket connections. This isn’t the exact config, but it shows the principle:

{
  "websocket_connection_timeout": "30s",
  "max_concurrent_connections_per_user": 5,
  "heartbeat_interval": "15s",
  "message_batch_size": 100
}

This configuration dictates how long a connection stays open, how many devices a user can be connected on simultaneously, how often the server checks if the client is still alive, and how many messages are bundled together before being sent to a client.

The surprising thing is how much effort goes into not delivering a message immediately if the recipient isn’t online. Instead of just waiting, the system aggressively uses push notifications and background syncing to make it seem like the message arrived instantly when the user next opens the app, even if they were offline for hours.

The next concept to explore is how Messenger handles group chats, which introduces complexities in message fan-out and synchronization across multiple recipients.

Want structured learning?

Take the full System Design course →