strace timestamps aren’t just about when a syscall happened; they’re a window into how long the kernel spent fulfilling it.

Let’s see strace in action. Imagine a simple C program that writes to a file:

#include <stdio.h>
#include <stdlib.h>

int main() {
    FILE *fp = fopen("output.txt", "w");
    if (fp == NULL) {
        perror("fopen");
        return 1;
    }
    fprintf(fp, "Hello, strace!\n");
    fclose(fp);
    return 0;
}

We can trace this program with strace -T ./my_program. The -T flag is key here. Here’s a snippet of what you might see:

...
openat(AT_FDCWD, "output.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 <0.000015>
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000008>
write(3, "Hello, strace!\n", 15)          = 15 <0.000012>
close(3)                                  = 0 <0.000007>
...

Notice the <0.000015> after openat? That’s the duration, in seconds, that the kernel spent executing the openat system call. strace reports the time elapsed from when the syscall was entered by the kernel to when it returned.

The core problem strace -T helps diagnose is understanding performance bottlenecks within system calls themselves. If a program is slow, it’s easy to blame the application logic. But sometimes, the kernel’s response time is the culprit. For instance, a read or write call might be spending an inordinate amount of time waiting for I/O, or a mmap might be slow due to page faults. strace -T quantifies this kernel-side latency, allowing you to pinpoint which syscalls are the slowest and thus potential areas for optimization or investigation.

Internally, strace works by using the ptrace system call. When ptrace is active, the tracing process can intercept syscalls made by the traced process. For -T, it records the timestamp just before the traced process makes the syscall instruction and again immediately after the kernel returns from that syscall. The difference is the measured duration. This is not wall-clock time for the entire process, but specifically the time spent inside the kernel for that particular syscall.

The primary levers you control with strace -T are which process to trace and how to filter its output. You can trace a running process using its PID with strace -T -p <PID>, or launch a new process under strace like in the fopen example. You can also combine -T with other flags like -f to follow child processes, or -e trace=write,read to focus only on specific syscalls, drastically reducing output and focusing your timing analysis.

The most surprising thing about strace -T is that a syscall that appears to complete instantly in terms of application logic can actually be taking milliseconds of kernel time. This often happens with I/O-bound operations where the application logic is trivial, but the underlying storage or network is slow to respond. For example, a fsync call might return almost immediately to the application, but the kernel might spend a significant amount of time waiting for the disk to acknowledge the write.

If you see a syscall like select or poll with a very large duration, it’s not necessarily a problem with the syscall itself, but rather an indicator that the process was genuinely waiting for an event for that entire duration. The kernel was doing its job of sleeping until an event occurred. The real investigation would then shift to why the event didn’t occur sooner, or why the process is waiting for so long.

The next thing you’ll likely want to investigate is how to aggregate and analyze these timing metrics across many syscalls, perhaps to identify patterns of latency over time.

Want structured learning?

Take the full Strace course →