strace is showing you that your process is failing to interact with files because the underlying system calls are not behaving as expected.

Common Causes for open, read, write, close Failures

  1. File Not Found / Permissions Denied on open():

    • Diagnosis: Run strace -e trace=open,openat,creat <your_command>. Look for open() or openat() calls returning -1 ENOENT (No such file or directory) or -1 EACCES (Permission denied).
    • Cause: The file path is incorrect, the file doesn’t exist, or the user running the process lacks read/write permissions for the file or the directory containing it.
    • Fix:
      • Incorrect Path: Verify the absolute or relative path. For example, if your application expects /app/config.yaml but it’s in /etc/app/config.yaml, update the application’s configuration or move the file.
      • File Missing: Create the file: touch /path/to/your/file.
      • Permissions: Use chmod to grant permissions. For example, chmod +r /path/to/your/file to grant read permission to everyone, or chmod u+rw /path/to/your/file to grant read/write to the owner. For directories, use chmod +x /path/to/directory to allow traversal.
    • Why it works: These commands directly alter the file system’s metadata, making the file visible and accessible to the process according to standard Unix permissions.
  2. Disk Full on write() or creat():

    • Diagnosis: strace -e trace=write,creat <your_command>. Look for write() or creat() calls returning -1 ENOSPC (No space left on device).
    • Cause: The file system where the file is being written to has run out of disk space.
    • Fix:
      • Check Disk Usage: Run df -h to see which file systems are full.
      • Free Space: Delete unnecessary files. For example, rm /path/to/large/old_log_file.log. If this is a persistent issue, consider resizing the partition or adding more storage.
    • Why it works: ENOSPC is a direct indication from the kernel that it cannot allocate new blocks for the file due to the underlying storage being exhausted. Freeing space allows the kernel to perform the allocation.
  3. Too Many Open Files (open()):

    • Diagnosis: strace -e trace=open,openat <your_command>. Look for open() or openat() calls returning -1 EMFILE (Too many open files) or -1 ENFILE (Too many open files in system).
    • Cause: The process has reached its per-process limit for open file descriptors, or the entire system has reached its limit.
    • Fix:
      • Increase Per-Process Limit: Edit /etc/security/limits.conf and add lines like:
        * soft nofile 65536
        * hard nofile 65536
        
        Then, either log out and back in, or run ulimit -n 65536 in the shell before starting your command.
      • Increase System-Wide Limit: Edit /etc/sysctl.conf and add/modify:
        fs.file-max = 200000
        
        Apply with sysctl -p.
    • Why it works: These limits are kernel parameters controlling how many file descriptors a process or the entire system can manage. Increasing them allows the kernel to track more open files.
  4. Bad File Descriptor on read(), write(), close():

    • Diagnosis: strace -e trace=read,write,close <your_command>. Look for these calls returning -1 EBADF (Bad file descriptor).
    • Cause: The file descriptor number being used by the process is invalid. This often happens if a file descriptor was closed prematurely, or if a descriptor number is being reused incorrectly.
    • Fix: This is usually a bug in the application itself. Debug the application’s logic for managing file descriptors. Ensure close() is not called on a descriptor that has already been closed or was never opened. For example, a common pattern is to check if fd > 2 before attempting to close it, as 0, 1, and 2 are standard input, output, and error.
    • Why it works: File descriptors are small integers that the kernel uses to identify open files. An EBADF error means the kernel received a number it doesn’t recognize as an active, open file descriptor for that process.
  5. I/O Error on read() or write():

    • Diagnosis: strace -e trace=read,write <your_command>. Look for read() or write() calls returning -1 EIO (Input/output error).
    • Cause: A hardware-level error occurred during the read or write operation. This could be a failing disk, a bad cable, or a problem with the storage controller.
    • Fix:
      • Check System Logs: Examine dmesg or /var/log/syslog for hardware-related error messages.
      • Hardware Diagnostics: Run hardware tests on the storage device.
      • Replace Hardware: If hardware failure is confirmed, replace the faulty component (e.g., the hard drive).
    • Why it works: EIO is a low-level error reported by the device driver when it cannot communicate successfully with the storage hardware.
  6. Interrupted System Call (read(), write()):

    • Diagnosis: strace -e trace=read,write <your_command>. Look for read() or write() calls returning -1 EINTR (Interrupted system call).
    • Cause: The system call was interrupted by a signal (e.g., SIGINT from Ctrl+C, or SIGTERM from kill).
    • Fix: The application needs to handle EINTR by retrying the system call. Most modern libraries and standard library functions do this automatically, but custom code might not. The fix is within the application’s signal handling or retry logic. For example, a loop around the read call that continues if errno == EINTR.
    • Why it works: The kernel temporarily stops the system call to deliver the signal. If the system call is restartable, the kernel will automatically restart it after the signal handler finishes. If it’s not restartable, or if the application doesn’t handle the EINTR return code, it appears as an error.

The next error you’ll likely encounter after fixing these is a SIGSEGV (Segmentation fault) if the application’s internal state is corrupted due to the earlier I/O failures.

Want structured learning?

Take the full Strace course →