strace’s delay injection feature is a surprisingly powerful, yet often overlooked, tool for simulating real-world network and I/O latency directly at the syscall level.

Let’s see it in action. Imagine you have a critical application that makes a lot of read() syscalls to a network device or a slow disk. To simulate a sluggish network, you can inject a delay into these read() calls.

Here’s a simplified Python script that repeatedly reads from standard input:

import sys
import time

print("Starting read loop. Press Ctrl+C to exit.")
while True:
    try:
        data = sys.stdin.read(1024) # Read up to 1024 bytes
        if not data:
            print("EOF reached.")
            break
        # Simulate some processing
        time.sleep(0.01)
    except KeyboardInterrupt:
        print("\nExiting.")
        break

Now, let’s run this script and then attach strace to inject a 50-millisecond delay into every read() syscall.

First, find the PID of the running Python script. If it’s the most recently started process, pgrep might work:

pgrep -f "python your_script_name.py"

Let’s say the PID is 12345. Now, attach strace with the delay:

strace -p 12345 -e inject=read=50ms

When you interact with the Python script (e.g., by typing characters into its terminal, which will trigger read() calls), you’ll notice a distinct pause between each read operation, even though the Python script itself only has a time.sleep(0.01). The strace command is intercepting the read() syscall and forcing it to wait for 50 milliseconds before returning, regardless of how quickly the underlying system call would normally complete.

This capability is invaluable for testing how your application behaves under conditions of high latency or intermittent unresponsiveness. Instead of relying on flaky network conditions or slow hardware, you can deterministically introduce delays to uncover race conditions, timeouts, or user experience degradations that only manifest when operations take longer than expected.

The core problem strace delay injection solves is the difficulty of reliably reproducing performance-related bugs. Many bugs, especially those involving concurrency or resource contention, are intermittent and depend on specific timing. Simulating these timings without strace often involves complex network manipulation tools (like tc for network traffic shaping) or specialized hardware, which can be cumbersome to set up and manage. strace offers a lightweight, user-space solution that targets specific system calls.

Internally, strace works by attaching to a running process and intercepting its system calls. When the -e inject=syscall=delay option is used, strace modifies the execution flow. Upon detecting the specified syscall, it doesn’t immediately let the syscall return to the application. Instead, it pauses the traced process for the specified delay, and then allows the syscall to complete and return its result. The key is that the delay is injected before the syscall’s actual result is returned to the user-space program. This means the application perceives the syscall as taking longer than it actually did, or longer than it would have without the injection.

The inject option is quite flexible. You can specify delays in milliseconds (ms), microseconds (us), or even seconds (s). The syntax is syscall=delay. For example, to inject a 2-second delay into all write() syscalls: strace -p <pid> -e inject=write=2s. You can also combine multiple syscalls: strace -p <pid> -e inject=read=100ms,write=50ms. This allows for granular control over which operations are slowed down.

A common pitfall is assuming that strace injection adds to the syscall’s execution time. It doesn’t. strace intercepts the syscall after it has been initiated by the kernel but before its result is delivered back to the userland process. The actual kernel execution time of the syscall is usually very short. strace then adds its specified delay. So, if a read() syscall normally takes 1ms and you inject 50ms, the userland process sees the read() as taking 51ms.

When you’re done injecting delays, remember to detach strace (Ctrl+C in the strace terminal) or it will continue to intercept syscalls. If you forget and strace is killed abruptly, the process it was tracing might hang, waiting for the strace debugger to return control. In such cases, you might need to find the process and use kill -SIGCONT <pid> to resume it, or kill -9 <pid> if it’s unresponsive.

The next hurdle you’ll likely encounter is how to inject delays into syscalls that are made by child processes spawned by your application.

Want structured learning?

Take the full Strace course →