Performance Patterns in System Design: Caching, CDN, DB Tuning (2026)

Caching is often thought of as a way to make things faster, but its real superpower is making systems available when the underlying data source is slow or down.

Let’s see how it works in a simple web service. Imagine a GET /users/{id} endpoint that fetches user data from a database.

from flask import Flask, jsonify, request
import time
import redis

app = Flask(__name__)
cache = redis.Redis(host='localhost', port=6379, db=0)

# Simulate a slow database call
def fetch_user_from_db(user_id):
    print(f"Fetching user {user_id} from DB...")
    time.sleep(2) # Simulate 2-second latency
    return {"id": user_id, "name": f"User {user_id}", "email": f"user{user_id}@example.com"}

@app.route('/users/<int:user_id>', methods=['GET'])
def get_user(user_id):
    cache_key = f"user:{user_id}"
    cached_data = cache.get(cache_key)

    if cached_data:
        print(f"Cache hit for user {user_id}")
        return jsonify(json.loads(cached_data))

    print(f"Cache miss for user {user_id}")
    user_data = fetch_user_from_db(user_id)
    cache.set(cache_key, json.dumps(user_data), ex=60) # Cache for 60 seconds
    return jsonify(user_data)

if __name__ == '__main__':
    app.run(debug=True)

When you hit http://127.0.0.1:5000/users/123 for the first time, you’ll see "Fetching user 123 from DB…" and it will take about 2 seconds. The response is then served. If you hit it again within 60 seconds, you’ll see "Cache hit for user 123" and the response is nearly instantaneous. The redis.Redis object is our cache, storing the JSON payload of the user data with a key like user:123. The ex=60 tells Redis to automatically expire this key after 60 seconds.

This pattern is fundamental. A Content Delivery Network (CDN) is essentially a distributed cache for static assets (images, CSS, JavaScript) across many geographical locations. Instead of every user hitting your origin server, they hit the closest CDN edge server, drastically reducing latency and load on your infrastructure. Think of Cloudflare, Akamai, or AWS CloudFront.

Database tuning is the art of making the database itself faster. This involves several layers. At the simplest level, it’s about query optimization. A query like SELECT * FROM users WHERE email LIKE '%@example.com' is slow because it has to scan the entire users table. Adding an index on the email column, CREATE INDEX idx_email ON users (email);, allows the database to quickly look up rows matching that pattern, turning a full table scan into a much faster index lookup.

Beyond indexing, database configuration parameters are critical. For PostgreSQL, tuning shared_buffers is paramount. This parameter dictates how much RAM PostgreSQL can use to cache data pages. Setting it too low means more disk reads; too high can starve other processes. A common starting point is 25% of your system’s total RAM, e.g., shared_buffers = 1GB on a 4GB machine. This cached data means fewer expensive disk I/O operations. Similarly, work_mem controls the amount of memory used for internal sort operations and hash tables before spilling to disk. Increasing it, say from the default 4MB to 32MB, can significantly speed up complex ORDER BY or GROUP BY clauses.

For MySQL, innodb_buffer_pool_size is the equivalent of PostgreSQL’s shared_buffers. It’s the memory area that InnoDB uses to cache table and index data. A good rule of thumb is to set it to 70-80% of available RAM on a dedicated database server, e.g., innodb_buffer_pool_size = 4G. Another crucial MySQL parameter is query_cache_size. While deprecated in newer versions, if you’re on an older MySQL, a well-tuned query cache can store the exact results of SELECT statements, returning them immediately without re-executing the query. Setting query_cache_type = 1 and query_cache_size = 64M can provide a significant boost for read-heavy workloads with repetitive queries.

The most counterintuitive aspect of caching is that stale data is often better than no data. Systems are designed for eventual consistency, not immediate consistency. If your cache has data that is 5 minutes old, but your database is completely unavailable, serving that 5-minute-old data is usually a far superior user experience than showing an error page. This is the core principle behind cache-first architectures; the cache is treated as the primary data source, and the backing store is a fallback.

The next evolution in performance patterns often involves understanding how these different caching layers (application-level, CDN, database buffer pools) interact and sometimes conflict, leading to cache invalidation strategies.

More Deep Dives in System Design