eBPF tracing is the future and strace is a relic.

Here’s a simple eBPF program that traces open() calls and prints the filename:

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024);
} rb SEC .maps;

struct event {
    int pid;
    char filename[256];
};

SEC("tp/syscalls/sys_enter_open")
int handle_open(struct trace_event_raw_sys_enter *ctx) {
    struct event *e;
    int pid = bpf_get_current_pid_tgid() >> 32;
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
    struct pt_regs *regs = (struct pt_regs *)ctx->common.reg;
    char **filename_ptr = (char **)PT_REGS_PARM2_SYSCALL(regs);
    char filename[256];

    if (bpf_probe_read_user_str(&filename, sizeof(filename), filename_ptr) < 0) {
        return 0;
    }

    e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (!e) {
        return 0;
    }

    e->pid = pid;
    bpf_strncpy(e->filename, filename, sizeof(e->filename) - 1);
    e->filename[sizeof(e->filename) - 1] = '\0'; // Ensure null termination

    bpf_ringbuf_submit(e, 0);
    return 0;
}

char _license[] SEC("license") = "GPL";

Compile this with clang -target bpf -g -O2 -c open_trace.c -o open_trace.o and load it with bpftool prog load open_trace.o /sys/fs/bpf/open_trace. Then, use a simple userspace program to read from the ring buffer:

import ctypes as ct
from bcc import BPF

bpf_text = """
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 256 * 1024);
} rb SEC .maps;

struct event {
    int pid;
    char filename[256];
};

SEC("tp/syscalls/sys_enter_open")
int handle_open(struct trace_event_raw_sys_enter *ctx) {
    struct event *e;
    int pid = bpf_get_current_pid_tgid() >> 32;
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
    struct pt_regs *regs = (struct pt_regs *)ctx->common.reg;
    char **filename_ptr = (char **)PT_REGS_PARM2_SYSCALL(regs);
    char filename[256];

    if (bpf_probe_read_user_str(&filename, sizeof(filename), filename_ptr) < 0) {
        return 0;
    }

    e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (!e) {
        return 0;
    }

    e->pid = pid;
    bpf_strncpy(e->filename, filename, sizeof(e->filename) - 1);
    e->filename[sizeof(e->filename) - 1] = '\0'; // Ensure null termination

    bpf_ringbuf_submit(e, 0);
    return 0;
}

char _license[] SEC("license") = "GPL";
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="sys_enter_open", fn_name="handle_open")

print("Tracing open() calls...")

def print_event(cpu, data, size):
    event = ct.cast(data, ct.POINTER(ct.c_ubyte * size)).contents
    event_struct = ct.cast(event, ct.POINTER(ct.c_char * (size - 4) + ct.c_int)).contents
    pid = ct.cast(ct.addressof(event_struct) + size - 4, ct.POINTER(ct.c_int)).contents.value
    filename = ct.cast(event, ct.POINTER(ct.c_ubyte * (size - 4))).contents.value.decode('utf-8', errors='ignore')
    print(f"PID: {pid}, Filename: {filename}")

b["rb"].open_perf_buffer(print_event)
while 1:
    try:
        b.perf_buffer_poll()
    except KeyboardInterrupt:
        exit()

Run this Python script, and then in another terminal, run touch /tmp/testfile. You’ll see output like:

Tracing open() calls...
PID: 12345, Filename: /tmp/testfile

This demonstrates eBPF’s ability to hook into kernel tracepoints, read user-space data (the filename), and efficiently pass it back to userspace via a ring buffer with minimal overhead.

The fundamental problem strace and eBPF tracing solve is understanding what a process is doing at a low level, specifically by observing system calls. strace intercepts system calls by attaching to the process and using ptrace. When a process makes a system call, the kernel stops the process, returns control to strace, which then decodes the system call arguments and prints them. After strace is done, it resumes the process. This context switching and decoding is what makes strace relatively slow and can affect the behavior of the traced application.

eBPF, on the other hand, operates differently. It allows you to run sandboxed programs inside the Linux kernel. Instead of intercepting and stopping a process, you attach eBPF programs to specific kernel events (like tracepoints or kprobes). These eBPF programs execute directly within the kernel’s context, can access kernel data structures, and then send data out to userspace (e.g., via ring buffers or perf events). This avoids the costly context switches and process suspension that strace relies on, leading to significantly lower overhead.

The core idea behind eBPF is providing a safe, programmable interface to the kernel. You write eBPF programs in a restricted C-like language, which are then verified by the kernel for safety (e.g., to prevent infinite loops or memory corruption). Once verified, these programs can be loaded and attached to various hook points within the kernel. This allows for dynamic instrumentation of the kernel without modifying its source code or recompiling it.

You control eBPF tracing through the choice of hook points and the logic within your eBPF programs. Hook points can be:

  • Tracepoints: Predefined static instrumentation points in the kernel that signal specific events (e.g., sys_enter_open, tcp_receive_msg). These are generally stable and provide rich context.
  • Kprobes/Kretprobes: Dynamic probes that can be attached to almost any kernel function entry (kprobe) or return (kretprobe). These are powerful but can be less stable across kernel versions.
  • Uprobes/Uretprobes: Similar to kprobes but for user-space functions.
  • Network Hooks: Specific points in the network stack (e.g., sched_cls, tc hooks) for network traffic analysis.

The eBPF program logic then determines what data is collected, how it’s processed (e.g., aggregation, filtering), and how it’s sent to userspace. This gives you fine-grained control over what you’re observing.

The one thing most people don’t realize is that eBPF programs are not directly "attached" to a specific process in the way strace is. When you attach an eBPF program to a kernel tracepoint like sys_enter_open, that program will run for every process that triggers that tracepoint system-wide. You then filter and correlate the collected data in userspace based on PIDs, timestamps, or other metadata captured by the eBPF program itself. This global visibility is a double-edged sword: incredibly powerful for system-wide analysis, but requires careful management to avoid overwhelming your userspace collector with irrelevant data.

The next step is to explore how eBPF can be used for more complex network troubleshooting, such as tracing TCP retransmissions.

Want structured learning?

Take the full Strace course →