Slack’s real-time messaging isn’t a single WebSocket connection per user; it’s a sophisticated system designed for massive concurrency and low latency, leveraging multiple connection types and intelligent routing.

Let’s see this in action. Imagine a user, Alice, sends a message in a public channel.

{
  "type": "message",
  "channel": "C123ABC456",
  "text": "Hello team!",
  "user": "U789XYZ012",
  "ts": "1678886400.000123"
}

This message, originating from Alice’s client, doesn’t just hop onto a single server. It enters Slack’s backend, which then orchestrates its delivery to all other users subscribed to channel C123ABC456. For users currently online, this delivery happens via their persistent WebSocket connections. For those offline, the message is queued for delivery upon their next connection.

The core problem Slack solves is delivering millions of messages per second to millions of users concurrently, with sub-second latency, across a globally distributed infrastructure. This requires a deep understanding of network protocols, distributed systems, and efficient data handling.

Internally, Slack uses a hybrid approach. For active users, a primary WebSocket connection handles real-time events like new messages, presence updates, and typing indicators. This connection is multiplexed, meaning multiple logical streams of data travel over a single physical connection. However, to avoid overwhelming a single WebSocket with too many events or to handle situations where a persistent connection might drop, Slack also employs other mechanisms. This includes periodic polling for less critical updates and HTTP long-polling as a fallback. The system intelligently decides which method to use based on network conditions, user activity, and the type of data being transmitted.

The ts (timestamp) field is critical. It’s not just a simple Unix timestamp; it’s a precisely generated, globally unique identifier for each message. This ts is composed of seconds since the Unix epoch, followed by a microsecond-level precision component. This ensures strict ordering of messages within a channel, even in a distributed system where messages might arrive out of order due to network latency or processing delays. The backend uses this ts to reconstruct the correct message sequence before sending it to clients.

When a user connects, they establish a WebSocket connection to a specific gateway server. This gateway server is part of a larger pool responsible for managing active connections in a given region. The system doesn’t bind a user to a single gateway indefinitely. If a gateway server becomes overloaded or needs maintenance, the connection is gracefully migrated to another available gateway with minimal disruption to the user. This involves a handoff process where the new gateway synchronizes the user’s state and active subscriptions from the old one.

A key detail often missed is how Slack handles message fan-out. When a message is sent to a channel with thousands of members, the backend doesn’t iterate through each user’s connection individually. Instead, it leverages a publish-subscribe (pub/sub) messaging system. The message is published to a topic representing that channel. All connected gateway servers that have users subscribed to that channel are listening to the topic. When a message arrives on the topic, the gateway servers then efficiently push that message to all their connected clients who are part of that channel. This decouples the message sender from the message receivers, enabling massive scalability.

The channel ID, like C123ABC456, is more than just an identifier; it’s a key that dictates routing and subscription management. When a user joins a channel, their client signals this to the backend. The backend then adds that user’s connection to the relevant pub/sub topic for that channel. When a message is sent, it’s routed based on this channel ID to the appropriate topic, and subsequently to all subscribed clients via their gateway servers.

The next hurdle in mastering Slack’s architecture is understanding how presence information (online/offline status) is managed and propagated efficiently across the system.

Want structured learning?

Take the full System Design course →