The most surprising thing about profiling CPU and memory usage is that the "bottleneck" you perceive is almost never the actual bottleneck.

Let’s say you’re running a web service and notice it’s slow under load. Your first instinct might be to look at CPU. You’ve got htop open, you see a process hogging 95% CPU, and you think, "Aha! CPU bound." But what if that process is waiting for something else? Maybe it’s waiting for a database query, or a network response, or even just a lock that another thread is holding? The CPU isn’t the cause of the slowness; it’s just the symptom of the process being stuck.

Consider a simple Go web server:

package main

import (
	"database/sql"
	"fmt"
	"log"
	"net/http"
	"time"

	_ "github.com/go-sql-driver/mysql"
)

func dbQuery(db *sql.DB) {
	var id int
	// Simulate a slow query that takes 2 seconds
	err := db.QueryRow("SELECT SLEEP(2)").Scan(&id)
	if err != nil {
		log.Printf("DB error: %v", err)
	}
}

func handler(db *sql.DB) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()
		dbQuery(db)
		fmt.Fprintf(w, "Request took %v\n", time.Since(start))
	}
}

func main() {
	// Replace with your actual DB connection string
	db, err := sql.Open("mysql", "user:password@tcp(127.0.0.1:3306)/dbname?parseTime=true")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	http.HandleFunc("/", handler(db))
	log.Println("Starting server on :8080")
	log.Fatal(http.ListenAndServe(":8080", nil))
}

If you run this and hit http://localhost:8080 repeatedly, you might see CPU usage creep up on your server. But the actual bottleneck is the db.QueryRow("SELECT SLEEP(2)") call. The Go process is spending most of its time waiting for the database, not actively computing. Profiling CPU here would show high utilization, but the real problem lies in the I/O latency.

To understand what’s happening, you need to look beyond just CPU and memory metrics. You need to see what your process is doing.

CPU Profiling: What’s it really doing?

CPU profiling tells you where your program spends its time executing instructions. Tools like pprof in Go, perf in Linux, or py-spy for Python are invaluable.

For Go:

  1. Enable pprof: Add import _ "net/http/pprof" to your main package.
  2. Access profiles: Run go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30 (or your relevant port/path).
  3. Analyze: Commands like top, list <function_name>, web (to generate a SVG graph) will show you functions consuming the most CPU.

A typical Go pprof output might look like this:

Showing nodes with 100.00% total samples (first 100)
flat  flat%   sum%    cum   cum%
    30s  30.00%  30.00%    30s  30.00%  runtime.nanotime
    20s  20.00%  50.00%    20s  20.00%  runtime.pollDesc.wait
    15s  15.00%  65.00%    15s  15.00%  syscall.Syscall
    10s  10.00%  75.00%    10s  10.00%  runtime.lock
     5s   5.00%  80.00%     5s   5.00%  main.dbQuery
     5s   5.00%  85.00%     5s   5.00%  net.(*conn).Read

If you see a lot of time in runtime.pollDesc.wait or syscall.Syscall, it’s a strong indicator that your process is blocked on I/O – waiting for network, disk, or other system calls. runtime.nanotime might be high if you’re doing a lot of time-related operations, but often it’s background noise. The key is to look for functions that are actively doing work, not just waiting.

Memory Profiling: Where’s it all going?

Memory profiling reveals where your application is allocating memory. This is crucial for identifying leaks or excessive usage.

For Go:

  1. Access heap profile: Run go tool pprof http://localhost:6060/debug/pprof/heap.
  2. Analyze: Similar commands (top, list, web) apply. Look for functions with high "flat" or "cum" memory allocations.

A Go heap profile might show:

Showing nodes with 100.00% total samples (first 100)
flat  flat%   sum%    cum   cum%
  100MB  50.00%  50.00%  100MB  50.00%  main.processData (inline)
   50MB  25.00%  75.00%  150MB  75.00%  main.readLargeFile
   20MB  10.00%  85.00%   20MB  10.00%  bytes.Buffer.Grow
    5MB   2.50%  87.50%    5MB   2.50%  runtime.alloc

Here, main.processData and main.readLargeFile are allocating significant amounts of memory. If these allocations are unexpected or not being released, you have a memory leak or an inefficient data structure.

The System View: Beyond Your Process

Sometimes, your process is perfectly efficient, but the underlying system is the bottleneck.

  • Network: Use netstat -s to see TCP retransmissions, packet drops. High retransmissions mean packets are being lost, and your application is waiting for them to be resent.
  • Disk I/O: Tools like iostat -x 1 show disk utilization (%util), read/write rates, and average wait times (await). If your disk is at 100% utilization and await is high, your application is waiting for slow disk operations.
  • OS Limits: Check ulimit -n (open file descriptors) and /proc/sys/fs/file-max. If your application is hitting these limits, it can’t open new connections or files, causing errors and hangs.

The real trick is correlating these system-level observations with your application’s profiling data. If iostat shows a busy disk and your Go pprof shows a lot of time in syscall.read or syscall.write for file operations, you’ve found your culprit.

When you profile your application, you’re not just looking for the most expensive function. You’re looking for what the program is waiting for, or what data structures are growing unbounded. The CPU might be at 100%, but the root cause could be a single slow database query, a network hop that’s dropped packets, or a disk that can’t keep up.

The next thing you’ll likely encounter is how to profile specific types of operations, like blocking calls or goroutine leaks.

Want structured learning?

Take the full Vector course →