Capacity planning is often framed as predicting the future, but it’s really about understanding your system’s current behavior under stress and extrapolating from there.

Let’s say we’re planning capacity for a web service. We’ve got a load balancer, a fleet of application servers, and a database. Here’s a snapshot of what a minute of traffic might look like:

# Load Balancer logs (simplified)
2023-10-27T10:00:00Z request_id="abc" method="GET" path="/users" backend="app-server-01" latency_ms=50
2023-10-27T10:00:01Z request_id="def" method="POST" path="/orders" backend="app-server-02" latency_ms=120
2023-10-27T10:00:02Z request_id="ghi" method="GET" path="/products" backend="app-server-03" latency_ms=75

# Application Server (app-server-01) metrics (simplified)
2023-10-27T10:00:00Z cpu_usage_percent=65.5
2023-10-27T10:00:00Z memory_usage_mb=3100
2023-10-27T10:00:00Z active_connections=150

# Database metrics (simplified)
2023-10-27T10:00:00Z query_count_per_sec=500
2023-10-27T10:00:00Z connection_count=200
2023-10-27T10:00:00Z cpu_usage_percent=80.0

The core problem this system solves is preventing performance degradation and outages due to insufficient resources. When demand outstrips supply, users experience slow responses, timeouts, and eventually, the service becomes unavailable. Capacity planning aims to provision resources before this point is reached, ensuring a smooth experience for users and operational stability.

Internally, it’s a continuous loop. You monitor key performance indicators (KPIs) like request latency, error rates, CPU utilization, memory usage, and database connection counts. These metrics tell you how your system is performing now. You then use historical data and projections of future demand (e.g., expected traffic growth, seasonal peaks) to estimate future resource needs. Based on these estimates, you provision more instances, upgrade hardware, or optimize your application. Finally, you deploy these changes and continue monitoring to validate their effectiveness and identify new bottlenecks.

The levers you control are direct:

  • Number of instances: For stateless services, scaling out by adding more application server replicas is the primary method. If app-server-01 is at 70% CPU and latency is creeping up, adding another identical instance can distribute the load.
  • Instance size: For stateful services or components with fixed limits (like certain database configurations), you might need to upgrade to larger instance types with more CPU, RAM, or disk I/O.
  • Database configuration: This includes connection limits, buffer pool sizes, and even choosing a more powerful database instance.
  • Network bandwidth: Less common for typical web services but critical for data-intensive applications.
  • Software configuration: Tuning application thread pools, garbage collection parameters, or database query caches can significantly impact resource utilization.

A common trap is focusing solely on CPU and memory. For many services, the bottleneck isn’t compute, but rather I/O or network saturation. A database might have plenty of CPU headroom but be maxed out on disk IOPS, leading to slow queries and cascading failures. Similarly, a web service might be limited by the number of concurrent connections it can establish or maintain, rather than CPU. Always look at a breadth of metrics, including network throughput, disk read/write operations, and application-specific metrics like queue depths or active request counts.

The next step in managing this system is understanding how to automate the provisioning and deprovisioning of resources based on these capacity models.

Want structured learning?

Take the full Performance Engineering course →