Zero-Copy: The Secret to Kernel Bypass Performance

Zero-copy lets data bypass the kernel’s memory buffer, slashing the time and CPU cycles spent copying data between user space and kernel space.

Let’s watch it in action. Imagine a web server serving a static file. Without zero-copy, the typical path looks like this:

Read from disk to kernel buffer: The web server process asks the kernel to read a file. The kernel reads the file data from the disk into a buffer in kernel memory.
Copy from kernel buffer to user buffer: The web server process then asks the kernel to send this data. The kernel copies the data from its buffer into a buffer in the web server’s user-space memory.
Copy from user buffer to kernel socket buffer: The web server’s socket library copies the data from its user-space buffer into another buffer managed by the kernel for network transmission.
Copy from kernel socket buffer to network card buffer: Finally, the kernel copies the data from its socket buffer into the network interface card’s (NIC) own buffer for transmission.

That’s at least three, often four, distinct memory copies. Each copy consumes CPU cycles and memory bandwidth.

Zero-copy techniques, like sendfile(2) on Linux, cut out intermediate copies. Here’s how sendfile (a common zero-copy mechanism) works for our web server example:

Read from disk directly to kernel socket buffer: The web server calls sendfile(out_fd, in_fd, offset, count). out_fd is the socket file descriptor, in_fd is the file file descriptor. The kernel reads data directly from the disk’s page cache into the kernel’s socket buffer, bypassing the user-space buffer entirely.
DMA to network card buffer: The kernel then instructs the NIC to DMA (Direct Memory Access) the data directly from the kernel socket buffer to the NIC’s own memory. This step still involves a copy, but it’s handled by hardware (DMA) and the data never touches user-space memory.

The result? Data moves from disk to network with only one copy (from kernel buffer to NIC buffer), and that copy is often handled by hardware.

The core problem zero-copy solves is the overhead of data copying between kernel and user space, which becomes a significant bottleneck for I/O-intensive applications like high-performance web servers, file servers, and message queues. By eliminating these redundant copies, applications can achieve higher throughput and lower CPU utilization.

The key levers you control are the system calls you use. Instead of read() followed by write(), you’d use sendfile() for file-to-socket transfers, or splice() for pipe-to-pipe transfers within the kernel. For network-to-network transfers without touching user space, splice() can also be used.

When using sendfile, the offset parameter is crucial. It specifies where in the source file descriptor (in_fd) the reading should begin. If you pass NULL for offset, sendfile will automatically track the current file offset, making it behave more like a traditional read/write but still zero-copy. This automatic tracking is particularly useful for streaming large files where you don’t want to manually manage offsets for each chunk. The count parameter limits the number of bytes to transfer.

A common misconception is that zero-copy means no copies at all. This isn’t true. The data still needs to get from the kernel buffer to the network card’s buffer. Zero-copy techniques like sendfile with certain hardware (those supporting scatter-gather DMA) can further optimize this by allowing the kernel to pass descriptors of the memory buffers directly to the NIC, enabling the NIC to perform DMA directly from the kernel’s page cache without an intermediate copy into the NIC’s own memory. This advanced form is often called "zero-copy transmission."

The next concept you’ll encounter is how to achieve similar zero-copy benefits when data originates in user space, not directly from a file.

Related Concepts

More Deep Dives in System Design