The most surprising thing about profiling CPU and memory usage is that the "bottleneck" you perceive is almost never the actual bottleneck.
Let’s say you’re running a web service and notice it’s slow under load. Your first instinct might be to look at CPU. You’ve got htop open, you see a process hogging 95% CPU, and you think, "Aha! CPU bound." But what if that process is waiting for something else? Maybe it’s waiting for a database query, or a network response, or even just a lock that another thread is holding? The CPU isn’t the cause of the slowness; it’s just the symptom of the process being stuck.
Consider a simple Go web server:
package main
import (
"database/sql"
"fmt"
"log"
"net/http"
"time"
_ "github.com/go-sql-driver/mysql"
)
func dbQuery(db *sql.DB) {
var id int
// Simulate a slow query that takes 2 seconds
err := db.QueryRow("SELECT SLEEP(2)").Scan(&id)
if err != nil {
log.Printf("DB error: %v", err)
}
}
func handler(db *sql.DB) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
dbQuery(db)
fmt.Fprintf(w, "Request took %v\n", time.Since(start))
}
}
func main() {
// Replace with your actual DB connection string
db, err := sql.Open("mysql", "user:password@tcp(127.0.0.1:3306)/dbname?parseTime=true")
if err != nil {
log.Fatal(err)
}
defer db.Close()
http.HandleFunc("/", handler(db))
log.Println("Starting server on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
If you run this and hit http://localhost:8080 repeatedly, you might see CPU usage creep up on your server. But the actual bottleneck is the db.QueryRow("SELECT SLEEP(2)") call. The Go process is spending most of its time waiting for the database, not actively computing. Profiling CPU here would show high utilization, but the real problem lies in the I/O latency.
To understand what’s happening, you need to look beyond just CPU and memory metrics. You need to see what your process is doing.
CPU Profiling: What’s it really doing?
CPU profiling tells you where your program spends its time executing instructions. Tools like pprof in Go, perf in Linux, or py-spy for Python are invaluable.
For Go:
- Enable
pprof: Addimport _ "net/http/pprof"to yourmainpackage. - Access profiles: Run
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30(or your relevant port/path). - Analyze: Commands like
top,list <function_name>,web(to generate a SVG graph) will show you functions consuming the most CPU.
A typical Go pprof output might look like this:
Showing nodes with 100.00% total samples (first 100)
flat flat% sum% cum cum%
30s 30.00% 30.00% 30s 30.00% runtime.nanotime
20s 20.00% 50.00% 20s 20.00% runtime.pollDesc.wait
15s 15.00% 65.00% 15s 15.00% syscall.Syscall
10s 10.00% 75.00% 10s 10.00% runtime.lock
5s 5.00% 80.00% 5s 5.00% main.dbQuery
5s 5.00% 85.00% 5s 5.00% net.(*conn).Read
If you see a lot of time in runtime.pollDesc.wait or syscall.Syscall, it’s a strong indicator that your process is blocked on I/O – waiting for network, disk, or other system calls. runtime.nanotime might be high if you’re doing a lot of time-related operations, but often it’s background noise. The key is to look for functions that are actively doing work, not just waiting.
Memory Profiling: Where’s it all going?
Memory profiling reveals where your application is allocating memory. This is crucial for identifying leaks or excessive usage.
For Go:
- Access heap profile: Run
go tool pprof http://localhost:6060/debug/pprof/heap. - Analyze: Similar commands (
top,list,web) apply. Look for functions with high "flat" or "cum" memory allocations.
A Go heap profile might show:
Showing nodes with 100.00% total samples (first 100)
flat flat% sum% cum cum%
100MB 50.00% 50.00% 100MB 50.00% main.processData (inline)
50MB 25.00% 75.00% 150MB 75.00% main.readLargeFile
20MB 10.00% 85.00% 20MB 10.00% bytes.Buffer.Grow
5MB 2.50% 87.50% 5MB 2.50% runtime.alloc
Here, main.processData and main.readLargeFile are allocating significant amounts of memory. If these allocations are unexpected or not being released, you have a memory leak or an inefficient data structure.
The System View: Beyond Your Process
Sometimes, your process is perfectly efficient, but the underlying system is the bottleneck.
- Network: Use
netstat -sto see TCP retransmissions, packet drops. High retransmissions mean packets are being lost, and your application is waiting for them to be resent. - Disk I/O: Tools like
iostat -x 1show disk utilization (%util), read/write rates, and average wait times (await). If your disk is at 100% utilization andawaitis high, your application is waiting for slow disk operations. - OS Limits: Check
ulimit -n(open file descriptors) and/proc/sys/fs/file-max. If your application is hitting these limits, it can’t open new connections or files, causing errors and hangs.
The real trick is correlating these system-level observations with your application’s profiling data. If iostat shows a busy disk and your Go pprof shows a lot of time in syscall.read or syscall.write for file operations, you’ve found your culprit.
When you profile your application, you’re not just looking for the most expensive function. You’re looking for what the program is waiting for, or what data structures are growing unbounded. The CPU might be at 100%, but the root cause could be a single slow database query, a network hop that’s dropped packets, or a disk that can’t keep up.
The next thing you’ll likely encounter is how to profile specific types of operations, like blocking calls or goroutine leaks.