You’ve likely hit a wall trying to understand how processes are talking to each other in Linux. strace is your ultimate debugger for this, showing you the raw system calls, and when it comes to Inter-Process Communication (IPC), you’re looking at a specific set of syscalls.

Let’s dive into the common IPC mechanisms strace will reveal: pipes, sockets, and shared memory.

Pipes: The Unidirectional Flow

Pipes are the simplest form of IPC, essentially a unidirectional communication channel between two related processes (usually parent-child). When you strace a process that’s using a pipe, you’ll see a few key syscalls.

1. pipe(): This is how a pipe is created.

  • What it does: Allocates two file descriptors, one for reading and one for writing.
  • strace output: pipe([10, 11]) = 0
    • 10 is the read end, 11 is the write end.
    • The 0 return code signifies success.
  • Diagnosis: If pipe() fails, it’s usually because the process has hit its file descriptor limit.
    • Check: ulimit -n (for the current shell) or check /proc/<pid>/limits for the specific process.
    • Fix: Increase the limit: ulimit -n 4096 (for the current session) or adjust system-wide limits in /etc/security/limits.conf.
    • Why it works: pipe() needs available file descriptors. Increasing the limit provides more.
  • Common Error: If a process is writing to a pipe and gets EPIPE (Broken pipe), it means the reading end of the pipe has been closed.
    • Diagnosis: strace -e write ... and look for write(11, "data", 4) = -1 EPIPE (Broken pipe)
    • Fix: Ensure the reading process is still alive and hasn’t closed its end of the pipe prematurely. This often requires debugging the other process.
    • Why it works: The kernel detects that no process is listening on the read end and signals an error to the writer.

2. read() / write() on pipe file descriptors: Once created, data flows through these.

  • What it does: Moves data between user space buffers and the kernel’s pipe buffer.
  • strace output:
    • write(11, "hello\n", 6) = 6 (Writing to the pipe)
    • read(10, "hello\n", 1024) = 6 (Reading from the pipe)
  • Diagnosis: If write() blocks indefinitely, it means the pipe’s buffer is full, and the reader isn’t consuming data fast enough. If read() blocks indefinitely, the writer hasn’t sent anything more, or has closed its end.
    • Check: Monitor pipe buffer usage (though this is tricky directly with strace). lsof -p <pid> can show if file descriptors are open.
    • Fix: Optimize the reading process to consume data faster, or increase the pipe buffer size (this is a kernel tuning parameter, not easily changed per-pipe).
    • Why it works: Blocking ensures data integrity; a process waits until there’s space or data.

Sockets: The Networked Communicators

Sockets are more versatile, used for network communication (TCP/IP, UDP) and also for Unix domain sockets (UDS) for local IPC.

1. socket(): Creates a socket endpoint.

  • What it does: Allocates resources for communication.
  • strace output: socket(AF_INET, SOCK_STREAM, 0) = 3 (TCP IPv4 socket) or socket(AF_UNIX, SOCK_STREAM, 0) = 4 (Unix domain stream socket).
    • 3 or 4 is the file descriptor.
  • Diagnosis: Similar to pipe(), hitting the file descriptor limit is a possibility.
    • Check/Fix: ulimit -n or /etc/security/limits.conf.
  • Common Error: If socket() returns -1 EPROTONOSUPPORT, the requested protocol isn’t supported by the system.

2. bind(): Assigns an address (IP and port, or a file path for UDS) to a socket.

  • What it does: Makes the socket reachable.
  • strace output: bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  • Diagnosis: bind() fails with EADDRINUSE if the address/port is already in use.
    • Check: sudo ss -tulnp | grep 8080 (for network sockets) or sudo lsof | grep /path/to/uds.sock (for UDS).
    • Fix: Stop the process using the port/path, or choose a different address/port.
    • Why it works: Only one process can actively listen on a specific address/port combination.

3. listen(): Marks a socket as passive, ready to accept connections.

  • What it does: Tells the kernel to queue incoming connections.
  • strace output: listen(3, 5) = 0
    • 5 is the backlog queue size.
  • Diagnosis: If connections are dropped before accept(), the backlog might be too small, or the server is too slow to accept().
    • Check: Monitor connection queue lengths (e.g., ss -s output for TCP).
    • Fix: Increase the backlog (listen() argument) or speed up the accept/processing loop.
    • Why it works: The backlog is a finite buffer; exceeding it causes connection refusal.

4. accept(): Accepts an incoming connection.

  • What it does: Creates a new socket file descriptor for the communication with the client.
  • strace output: accept4(3, {sa_family=AF_INET, sin_port=htons(54321), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_NONBLOCK) = 5
    • 5 is the new file descriptor for the client connection.
  • Diagnosis: If accept() blocks, no client is connecting. If it fails with ECONNABORTED, the client disconnected unexpectedly.
    • Check: Ensure clients are attempting to connect.
    • Fix: Debug the client or network path.

5. connect(): Initiates a connection to a remote socket.

  • What it does: Establishes a connection.
  • strace output: connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.1.100")}, 16) = 0
  • Diagnosis: connect() can fail with ECONNREFUSED (server not listening), ETIMEDOUT (network issues or firewall), or ENETUNREACH (no route to host).
    • Check: ping 192.168.1.100, telnet 192.168.1.100 80.
    • Fix: Ensure the server is running, the IP/port are correct, and network connectivity is present.

Shared Memory: The Direct Memory Access

Shared memory is the fastest IPC mechanism because it bypasses kernel data copying. Processes map a region of physical memory into their own address space.

1. shmget(): Creates or opens a shared memory segment.

  • What it does: Returns a unique identifier (a "shmid") for a shared memory segment.
  • strace output: shmget(12345, 4096, IPC_CREAT|0666) = 98765
    • 12345 is a user-defined key.
    • 4096 is the size in bytes.
    • 98765 is the shmid.
  • Diagnosis: shmget() can fail with ENOSPC if the system limit on the number of shared memory segments (SHMALL, SHMMNI) is reached.
    • Check: ipcs -l shows system limits.
    • Fix: Increase kernel parameters like SHMALL (total shared memory in pages) and SHMMNI (max number of segments) in /etc/sysctl.conf and apply with sysctl -p.
    • Why it works: These parameters control the kernel’s allocation of shared memory resources.

2. shmat(): Attaches a shared memory segment to the process’s address space.

  • What it does: Maps the shared memory segment into the process’s memory map.
  • strace output: shmat(98765, NULL, 0) = 0x7f8c12345000
    • The return value is the virtual address where the segment is mapped.
  • Diagnosis: shmat() can fail if the shmid is invalid or if the process hits its virtual memory address space limits.
    • Check: Ensure shmget() succeeded and the shmid is correct. Check ulimit -v for virtual memory limits.
    • Fix: Correct the shmid or increase virtual memory limits.

3. shmdt(): Detaches a shared memory segment.

  • What it does: Unmaps the shared memory from the process’s address space.
  • strace output: shmdt(0x7f8c12345000) = 0

4. shmctl(): Controls shared memory operations (e.g., deleting a segment).

  • What it does: Manages shared memory segments.
  • strace output: shmctl(98765, IPC_RMID, NULL) = 0 (Requesting removal)

Common Shared Memory Pitfalls:

  • No Synchronization: Shared memory is just raw memory. If multiple processes write to it concurrently without locks (e.g., using flock on a file descriptor representing the shared memory, or mutexes if using libraries), you’ll get data corruption. strace won’t show this directly; you need higher-level debugging.
  • Orphaned Segments: If a process that created a shared memory segment crashes before shmctl with IPC_RMID is called, the segment persists in kernel memory.
    • Diagnosis: ipcs -m will list active shared memory segments.
    • Fix: Manually remove them with ipcrm -m <shmid>.
    • Why it works: ipcrm is a utility that directly calls shmctl(shmid, IPC_RMID, NULL) on your behalf.

After you’ve fixed all these IPC issues, the next error you’ll likely encounter involves the actual application logic failing because the data exchanged via IPC is malformed or missing.

Want structured learning?

Take the full Strace course →