A URL shortener doesn’t just map short URLs to long ones; it’s a master of dynamic redirection, leveraging clever encoding to represent vast numbers of unique IDs in a human-readable, easily typed format.

Let’s see this in action. Imagine we have a long URL: https://www.example.com/very/long/path/to/an/article/with/a/lot/of/parameters?id=12345&user=abcde

When a user requests a shortened version, say http://short.url/ABC, the server-side logic needs to perform two key operations:

  1. Decode the short identifier: The ABC part of http://short.url/ABC isn’t just arbitrary text. It’s a Base62 encoded representation of a unique numerical ID assigned to the original long URL.
  2. Lookup and Redirect: This numerical ID is used to query a database (like a key-value store or a relational database) to retrieve the original long URL. The server then issues an HTTP 301 (Moved Permanently) or 302 (Found) redirect to the user’s browser, sending them to the full URL.

Here’s a conceptual Go snippet illustrating the core redirect logic.

package main

import (
	"database/sql"
	"fmt"
	"net/http"
	"strings"

	"github.com/itchyny/base62" // Assuming a library for Base62 encoding/decoding
)

// Assume db is an initialized *sql.DB connection
var db *sql.DB

func redirectHandler(w http.ResponseWriter, r *http.Request) {
	// Extract the short code from the URL path
	// e.g., for "/ABC", shortCode will be "ABC"
	shortCode := strings.TrimPrefix(r.URL.Path, "/")
	if shortCode == "" {
		http.Error(w, "Short code not provided", http.StatusBadRequest)
		return
	}

	// Decode the Base62 short code to a numerical ID
	idBytes, err := base62.DecodeString(shortCode)
	if err != nil {
		http.Error(w, "Invalid short code format", http.StatusBadRequest)
		return
	}
	// Convert bytes to a common integer type for database lookup
	// This is a simplification; actual implementation might involve BigInts
	// or specific handling for very large IDs.
	var urlID int64
	// A more robust decode would handle endianness and potential overflow.
	// For simplicity, we'll assume a direct conversion works here.
	// A real system would likely use a library that returns an int64 or similar.
	// For this example, let's simulate by assuming the decoded value is directly usable.
	// In a real scenario, `base62.DecodeString` would likely return `[]byte`.
	// We'd then need to convert these bytes to an integer.
	// Let's assume a hypothetical `base62.DecodeToInt64` for clarity in this example.
	// If using `github.com/itchyny/base62`, you'd process the `[]byte` result.
	// For demonstration, let's assume `idBytes` represents the integer.
	// A common pattern is to use `binary.BigEndian.Uint64(idBytes)` or similar after padding.

	// --- SIMULATED DECODE TO INT64 ---
	// In a real app with `github.com/itchyny/base62`:
	// decodedBytes, err := base62.DecodeString(shortCode)
	// if err != nil { ... }
	// if len(decodedBytes) > 8 { /* handle overflow */ }
	// var buf [8]byte
	// copy(buf[8-len(decodedBytes):], decodedBytes)
	// urlID = int64(binary.BigEndian.Uint64(buf[:]))
	// --- END SIMULATED DECODE ---

	// For this example, let's just use a placeholder conversion for clarity.
	// A real system would have robust Base62 to integer conversion.
	// Let's assume 'ABC' decodes to a specific ID, e.g., 12345.
	// The actual conversion is critical.
	// For 'A' (index 0 in base62), it's 0. For 'B' (index 1), it's 1. For 'C' (index 2), it's 2.
	// Base62: 0-9, A-Z, a-z (62 chars).
	// 'A' is often mapped to 10 if 0-9 are used first. Let's use a common mapping:
	// 0-9 -> 0-9
	// A-Z -> 10-35
	// a-z -> 36-61
	// So, 'A' is 10, 'B' is 11, 'C' is 12.
	// If shortCode is "ABC", and this is a simple positional encoding:
	// ID = (10 * 62^2) + (11 * 62^1) + (12 * 62^0)
	// ID = (10 * 3844) + (11 * 62) + (12 * 1)
	// ID = 38440 + 682 + 12 = 39134.
	// Let's use this hypothetical ID.
	var computedID int64 = 39134 // This would come from base62.DecodeToInt64(shortCode)

	// Lookup the original URL from the database using the ID
	var originalURL string
	err = db.QueryRow("SELECT original_url FROM urls WHERE id = ?", computedID).Scan(&originalURL)
	if err != nil {
		if err == sql.ErrNoRows {
			http.NotFound(w, r)
		} else {
			fmt.Println("Database error:", err)
			http.Error(w, "Internal server error", http.StatusInternalServerError)
		}
		return
	}

	// Perform the HTTP redirect
	http.Redirect(w, r, originalURL, http.StatusFound) // Use 302 for temporary, 301 for permanent
}

// --- Database Setup (Conceptual) ---
// CREATE TABLE urls (
//     id BIGINT AUTO_INCREMENT PRIMARY KEY,
//     short_code VARCHAR(10) NOT NULL UNIQUE, -- Max length depends on Base62 conversion
//     original_url TEXT NOT NULL,
//     created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
// );
//
// INSERT INTO urls (id, short_code, original_url) VALUES (39134, 'ABC', 'https://www.example.com/very/long/path/to/an/article/with/a/lot/of/parameters?id=12345&user=abcde');

// --- Main Function (Conceptual) ---
// func main() {
//     // Initialize your database connection 'db' here
//     // ...
//     http.HandleFunc("/", redirectHandler)
//     fmt.Println("Starting server on :8080")
//     log.Fatal(http.ListenAndServe(":8080", nil))
// }

The core problem this system solves is the need for compact, human-friendly identifiers for potentially very long strings. A standard sequential integer ID for each URL would quickly become unwieldy. For instance, if you have a million URLs, the IDs would go up to 1,000,000. Representing this in Base10 requires 7 digits. Using Base62, the same million URLs would have IDs ranging from 0 up to a number that, when encoded, results in a much shorter string.

Base62 encoding is key here. It uses 62 unique characters (typically 0-9, A-Z, a-z) to represent numbers. This is like Base10 (0-9) or Base16 (hexadecimal, 0-9, A-F), but with a larger character set. The magic comes from the exponential growth of what can be represented with a given number of characters.

  • Base10 (1 character): 0-9 (10 possibilities)
  • Base10 (2 characters): 00-99 (100 possibilities)
  • Base62 (1 character): 0-9, A-Z, a-z (62 possibilities)
  • Base62 (2 characters): 62 * 62 = 3,844 possibilities
  • Base62 (3 characters): 62 * 62 * 62 = 238,328 possibilities
  • Base62 (4 characters): 62^4 = 14,776,336 possibilities
  • Base62 (5 characters): 62^5 = 916,132,832 possibilities

This means that even with just 5 Base62 characters, you can uniquely identify over 900 million URLs. A typical URL shortener service needs to handle millions, if not billions, of links, making Base62 an ideal choice for keeping the short URLs short and memorable.

The process of generating a short URL involves:

  1. Storing the original long URL in a database.
  2. Assigning a unique, sequential numerical ID to this new entry.
  3. Encoding this numerical ID using Base62.
  4. Storing the Base62 encoded string as the "short code" associated with the original URL (often in a separate column or a dedicated index for quick lookups).
  5. Constructing the short URL: http://yourshortdomain.com/ + base62_encoded_id.

The specific character set used for Base62 can vary slightly. A common convention is 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. The order matters for consistent encoding and decoding.

When a user requests a short URL, the server extracts the Base62 encoded string. It then decodes this string back into its original numerical ID. This ID is used to fetch the full URL from the database. Finally, an HTTP redirect (usually a 301 or 302 status code) is sent to the user’s browser, instructing it to navigate to the original, long URL.

The most surprising truth about this system is that the "short code" isn’t just a random string assigned by a service. It’s a direct, deterministic representation of a sequential primary key from a database. This means if you know the original ID, you can always calculate the short code, and vice-versa, without needing to consult the database for the encoding/decoding step itself. The database is only needed for the lookup of the original URL once the ID is known.

The next logical step to consider is how to handle massive scale, including potential race conditions when generating new IDs and ensuring fast, distributed lookups.

Want structured learning?

Take the full System Design course →