Consistent Hashing: Scale Without the Pain

Consistent hashing is actually a clever way to keep the number of keys that need to be remapped to an absolute minimum when you add or remove servers, which is way less than you’d think.

Let’s see it in action. Imagine we have three servers: server-1, server-2, and server-3. We want to map keys like user:123, product:abc, and session:xyz to these servers.

With traditional hashing, if we add server-4, a massive number of keys would suddenly point to the wrong server. Consistent hashing avoids this.

Here’s a simplified conceptual view using a ring. We hash both the server names and the keys onto this ring.

Keys:
user:123 -> 0.3
product:abc -> 0.7
session:xyz -> 0.9

Servers:
server-1 -> 0.2
server-2 -> 0.5
server-3 -> 0.8

A key is assigned to the next server clockwise on the ring.

user:123 (0.3) goes to server-1 (0.2).
product:abc (0.7) goes to server-2 (0.5).
session:xyz (0.9) goes to server-3 (0.8).

Now, let’s add server-4 with a hash of 0.6.

Keys:
user:123 -> 0.3
product:abc -> 0.7
session:xyz -> 0.9

Servers:
server-1 -> 0.2
server-2 -> 0.5
server-3 -> 0.8
server-4 -> 0.6

The assignments change:

user:123 (0.3) still goes to server-1 (0.2).
product:abc (0.7) now goes to server-4 (0.6). This is the only key that had to move.
session:xyz (0.9) still goes to server-3 (0.8).

The problem consistent hashing solves is the "thundering herd" of remapping that happens when your cache or distributed data store scales up or down. Traditional modulo hashing (hash(key) % N) means changing N (the number of servers) shuffles almost all the keys. Consistent hashing ensures that only a fraction of keys (ideally 1/N) need remapping when a server is added or removed.

It works by mapping both servers and data keys to points on a virtual ring. A data key is then assigned to the first server encountered when moving clockwise from the key’s position on the ring. When a server is added, it "steals" keys from only one other server (the one immediately preceding it on the ring). When a server is removed, its keys are taken over by the next server clockwise.

To improve load distribution and reduce the impact of uneven hash distributions, virtual nodes are often employed. Instead of mapping each physical server to a single point on the ring, each server is mapped to multiple points (e.g., server-1-v1, server-1-v2, server-1-v3). This spreads the responsibility of a single physical server across many locations on the ring, making the distribution much more uniform and ensuring that the removal or addition of a single physical server only impacts a small fraction of keys.

The number of keys that need to be remapped when adding or removing a server is roughly proportional to the total number of keys divided by the number of servers, not the total number of keys. This is why it’s so efficient for dynamic scaling.

The real trick to making consistent hashing work well in practice isn’t just the ring math, but using a good hashing algorithm (like SHA-1 or MurmurHash) to get a wide distribution of hash values, and then carefully choosing the number of virtual nodes per physical node to balance load evenness against the overhead of managing many virtual nodes.

The next challenge is understanding how to handle "hot spots" if the virtual node distribution isn’t perfectly uniform.

Related Concepts

More Deep Dives in System Design