The most surprising thing about system design is that it’s less about knowing a million specific technologies and more about understanding a few fundamental trade-offs that apply everywhere.
Let’s see what happens when we design a simple URL shortener, the kind that takes a long URL and gives you a tiny one. Imagine a user pastes a long URL into our service. We generate a short code, say aBcDeF, and store it in a database mapping aBcDeF to the original long URL. When someone visits http://short.url/aBcDeF, we look up aBcDeF, find the long URL, and redirect them.
User Request -> API Gateway -> URL Shortener Service -> Database (key-value store)
(long URL) (short_code -> long_url)
|
v
Short URL Response
(short_url)
User Request -> API Gateway -> URL Shortener Service -> Database Lookup
(short_url) (short_code -> long_url)
|
v
HTTP Redirect
This basic setup works, but it’s not very robust or scalable. What if millions of people use it?
The core problem this solves is information density. We want to represent a lot of data (a long URL) with very little data (a short code). This is crucial for efficiency: smaller URLs mean less data to transmit, less storage, and faster processing.
Internally, the system hinges on a few key components:
- ID Generation: How do we create unique short codes? We could use a random alphanumeric string, but that requires checking for collisions. A more deterministic approach is to use a counter. Imagine a service that dispenses sequential IDs (1, 2, 3, …). We can then convert these IDs into a base-62 representation (0-9, a-z, A-Z) to get our short codes. This guarantees uniqueness and is highly efficient.
- Data Storage: We need a fast way to map the short code to the long URL. A key-value store is ideal. Redis or DynamoDB are excellent choices. The key would be the short code, and the value would be the long URL.
- API Service: This is the entry point. It handles requests to shorten URLs and to resolve short URLs. It interacts with the ID generator and the data store.
- Load Balancing: As traffic grows, a single API service instance won’t suffice. We’ll need a load balancer (like Nginx or an AWS ELB) to distribute incoming requests across multiple API service instances.
Consider a scenario where we’re using a simple counter for ID generation. If we have multiple API servers, and they all try to increment the same counter directly, we’ll get race conditions and duplicate IDs. To fix this, we need a centralized, atomic way to generate IDs. A dedicated "ID Generation Service" that uses a database sequence or a distributed lock mechanism can serve this purpose. Each request to the ID service gets a unique, sequential number, which is then converted to base-62.
// Example ID Generation Service Response
{
"id": 123456789,
"short_code": "2rYnB"
}
When a user requests a shortened URL, the API service calls the ID Generation Service, gets a short_code, stores the mapping (short_code -> long_url) in the key-value store, and returns the short_code to the user. When the user clicks the short URL, the API service looks up the short_code in the key-value store, retrieves the long_url, and issues an HTTP 301 (Permanent Redirect) or 302 (Temporary Redirect) to the user’s browser.
The most counterintuitive part of scaling this is that the bottleneck often isn’t the database writes (generating new URLs), but the database reads (resolving existing URLs), especially if one popular URL gets shared everywhere. To handle this, we’d implement aggressive caching at the API service level or using a distributed cache like Memcached. A cache hit means we avoid a database lookup entirely, saving significant latency and load.
The next problem you’ll likely encounter is how to handle analytics – counting clicks on each short URL without impacting the performance of the core redirect path.