Horizontal scaling, not vertical, is often the key to truly massive, unpredictable growth.

Let’s see it in action. Imagine a simple web application.

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://app_servers; # This is the magic
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

upstream app_servers {
    server 192.168.1.10:8080;
    server 192.168.1.11:8080;
    server 192.168.1.12:8080;
}

This Nginx configuration points to a group of backend servers (app_servers). When a request comes in, Nginx (acting as a load balancer) picks one of these servers to handle it. If 192.168.1.10 gets overloaded, Nginx can simply stop sending it traffic and distribute it among 192.168.1.11 and 192.168.1.12. To add more capacity, you just spin up a new server (e.g., 192.168.1.13:8080) and add its IP to the upstream block. The load balancer automatically starts sending traffic its way.

This is horizontal scaling: adding more machines to your pool of resources. It’s like adding more checkout lanes at a busy supermarket. The alternative, vertical scaling, is like making one checkout lane incredibly fast and efficient, but it has limits.

Vertical scaling means increasing the resources of a single machine: more CPU, more RAM, faster storage. For a database, this might mean upgrading from a 16-core, 128GB RAM server to a 64-core, 512GB RAM behemoth. It’s simpler to manage initially because you’re still dealing with one instance. However, you eventually hit a ceiling. There’s a maximum size for a single server, and the cost per unit of resource (CPU, RAM) tends to increase dramatically at the high end. Plus, if that single, powerful machine fails, your entire application goes down.

Horizontal scaling, on the other hand, allows for near-infinite growth. You can add hundreds or thousands of small, cheap machines. The cost scales linearly, and the failure of one machine has minimal impact because others pick up the slack. The complexity lies in managing the distribution of work and state across these many machines.

This is where sharding comes in, especially for databases. If your database is growing too large to fit on even the most powerful single server, or if read/write operations are overwhelming a single instance, you shard. Sharding is a form of horizontal scaling for data. Instead of having one massive database, you split it into smaller, more manageable pieces called shards. Each shard contains a subset of the total data.

Consider a user database. You could shard by user_id. Users 1-1,000,000 might be on Shard A, users 1,000,001-2,000,000 on Shard B, and so on. When a request comes in to fetch user data, your application logic (or a dedicated routing layer) determines which shard holds that user’s data and directs the query accordingly. This distributes the data and the load across multiple database servers.

The challenge with sharding is choosing the right shard key and handling cross-shard operations. If your shard key is poorly chosen (e.g., sharding by creation_date when most queries are for the latest users), you can end up with "hot shards" that are still overloaded. Cross-shard transactions are complex and often avoided because they require coordinating multiple independent databases.

The most surprising thing about these patterns is how often the "obvious" vertical scaling is chosen first, only to become a bottleneck later. People focus on making one thing faster, rather than making many things work together. When a system is designed for horizontal scaling from the outset, even simple components like web servers can be scaled out by simply adding more instances behind a load balancer. The underlying principle is that distributing work across multiple independent units is fundamentally more scalable than concentrating it on a single, increasingly powerful unit.

The next hurdle you’ll face is managing distributed state and ensuring consistency across horizontally scaled services and sharded data.

Want structured learning?

Take the full System Design course →