The seccomp filter in Linux, designed to restrict system calls a process can make, is silently dropping or blocking system calls without telling you why, which is a debugging nightmare.

When seccomp blocks a system call, the process typically receives a SIGSYS signal, which usually terminates the process. The challenge is that SIGSYS doesn’t inherently tell you which system call was blocked or why the seccomp filter decided to block it. This is where strace comes in, but strace itself needs to be configured to reveal these seccomp violations.

Here’s how to debug seccomp filter violations using strace:

1. Enable seccomp Tracing in strace

The most direct way to see seccomp violations is to tell strace to report them. You do this with the -e trace=seccomp option.

Diagnosis Command:

strace -e trace=seccomp -f your_program --your-args

The -f flag is crucial here; it tells strace to follow child processes, as seccomp filters are often applied to processes that fork.

Why it works: This flag specifically instructs strace to intercept and print information about seccomp operations, including when a filter action causes a system call to be denied. Instead of just seeing the process crash with SIGSYS, you’ll see a seccomp(SECCOMP_FILTER_FAIL, ...) message.

2. Understanding the seccomp Filter Action

When strace reports a seccomp violation, the output will look something like this:

...
Process 12345 attached
seccomp(SECCOMP_FILTER_FAIL, syscall=write, arch=x86_64, ...)
...

Diagnosis: The output seccomp(SECCOMP_FILTER_FAIL, syscall=write, ...) tells you that the seccomp filter blocked the write system call for the process with PID 12345. The arch field indicates the architecture the system call was attempted on.

Why it works: This output is the direct signal from the kernel that a seccomp rule was triggered and resulted in the denial of a system call. strace is then able to report this kernel event.

3. Inspecting the seccomp Filter Rules

The actual seccomp filter rules are defined using a specific Berkeley Packet Filter (BPF) program. To understand why a system call was blocked, you need to inspect these rules. This is often done by examining the code that loads the seccomp filter, or by using kernel debugging tools if the filter is already loaded.

Diagnosis Command (if you have access to the filter loading code):

Look for seccomp_init() and seccomp_rule_add() calls in the program’s source code. These functions are used to set up the seccomp context and add individual rules.

Diagnosis Command (using kernel debugging if filter is already loaded, more advanced):

You can sometimes inspect loaded BPF programs in /proc/<pid>/fdinfo/<fd> or via bpftool. However, directly inspecting the logic of a loaded seccomp filter without source code is complex. The primary method is to analyze the code that installs the filter.

Why it works: The seccomp filter is essentially a small BPF program. Each rule within this program specifies conditions under which certain system calls are allowed or denied. By examining the code that defines these rules, you can pinpoint the exact condition that led to the SIGSYS or SECCOMP_FILTER_FAIL event.

4. Common Causes and Fixes

  • Unexpected System Call: The most frequent cause is a program attempting a system call that the seccomp filter explicitly disallows, or that isn’t explicitly allowed by a broad "allow all" rule.

    • Diagnosis: Use strace -e trace=!seccomp -f your_program --your-args to see all system calls your program makes. Compare this list against the seccomp filter rules.
    • Fix: Modify the seccomp filter rules to allow the necessary system call. For example, if futex is being blocked and is required:
      // Assuming you're using libseccomp
      seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(futex), 0);
      
    • Why it works: This explicitly permits the futex system call, satisfying the filter’s requirements.
  • Incorrect System Call Arguments: seccomp filters can be very granular, checking not just the system call number but also its arguments. A system call might be allowed in general, but blocked due to specific argument values.

    • Diagnosis: In your strace -e trace=seccomp output, look at the arguments provided to the seccomp(SECCOMP_FILTER_FAIL, ...) call. This often includes the system call number and architecture. If the filter logic is complex, you might need to use strace -e trace=all to see the arguments of the blocked syscall itself, and then correlate that with the filter logic.
    • Fix: Adjust the program’s logic to use arguments that conform to the seccomp filter’s allowed set, or modify the filter to permit the specific argument values. For instance, if openat is blocked when O_PATH flag is used:
      // Example of allowing O_PATH for openat (simplified)
      seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(openat),
                       SCMP_A(_RAW(2), SCMP_CMP_MASK, O_PATH)); // Check for O_PATH flag
      
    • Why it works: This adds a specific rule that allows openat only when the O_PATH flag is present in its argument, rather than blocking it entirely.
  • Architecture Mismatch: System calls have different numbers and argument structures across different architectures (e.g., x86_64, arm64). A filter might be written for one architecture but applied to a process running on another.

    • Diagnosis: The arch= field in the seccomp trace output will show the architecture the system call was attempted on. Ensure your seccomp filter has rules defined for all relevant architectures.
    • Fix: Update the seccomp filter definition to include rules for the architecture experiencing the failure. If using libseccomp, it often handles architecture normalization, but custom BPF might require explicit checks.
      // libseccomp typically handles this, but if writing raw BPF:
      // Check current architecture and apply appropriate rules.
      
    • Why it works: Ensures that the system call invoked on a specific architecture is evaluated against rules designed for that architecture.
  • Incorrect seccomp Mode: The seccomp system call can be used in different modes (e.g., SECCOMP_MODE_STRICT, SECCOMP_MODE_FILTER). If the wrong mode is set, or if a filter isn’t properly installed, unexpected behavior can occur.

    • Diagnosis: Use strace -e seccomp -f your_program --your-args. Look for seccomp(SECCOMP_SET_MODE_STR,... or seccomp(SECCOMP_ADD_RULE,.... This helps confirm the filter is being set up as expected.
    • Fix: Ensure the program correctly initializes seccomp using seccomp_init() and adds rules using seccomp_rule_add() before attempting to use restricted system calls.
    • Why it works: Correct initialization ensures the kernel is aware of and actively enforcing the defined filter.
  • Filter Not Loaded or Incorrectly Applied: The seccomp filter might not have been successfully loaded into the kernel for the target process.

    • Diagnosis: strace -e seccomp -f your_program --your-args. If you see no seccomp() calls related to filter setup, or if you see seccomp(SECCOMP_SET_MODE_STR, SECCOMP_MODE_STRICT, ...) without subsequent ADD_RULE calls, the filter might not be active.
    • Fix: Ensure the code responsible for setting up the seccomp filter runs correctly and without error before the problematic system calls are made. Check return codes from seccomp_init and seccomp_rule_add.
    • Why it works: Guarantees that the kernel is running the intended set of rules for the process.
  • Race Conditions: In multi-threaded or multi-process applications, there might be a race condition where a child process is forked and starts making system calls before the seccomp filter has been fully applied to it.

    • Diagnosis: This is harder to diagnose directly with strace. You’ll typically see SIGSYS on the child process. The key is that the parent process might have a seccomp filter, but the child doesn’t inherit it correctly or gets a head start.
    • Fix: Ensure that the seccomp filter is applied after forking but before the child process executes its main logic, or that the parent applies the filter and then forks. The prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ...) call should be made in the child process context, ideally immediately after fork().
    • Why it works: This ensures the seccomp filter is active and enforced from the very first system call the child process makes.

After fixing all seccomp filter violations, the next error you’ll likely encounter is a SIGSEGV (segmentation fault) if the program’s logic was fundamentally flawed due to the system call restrictions, or perhaps a different system call being blocked that you missed.

Want structured learning?

Take the full Strace course →