The real reason tcpdump drops packets isn’t about its own speed; it’s about the network interface’s inability to keep up with the raw traffic volume hitting it.
Let’s watch tcpdump in action, but first, we need a scenario. Imagine a busy web server. We want to capture all incoming HTTP requests to diagnose a slow response issue.
First, set up a simple web server.
sudo apt-get update && sudo apt-get install -y apache2
sudo systemctl start apache2
sudo systemctl enable apache2
Now, generate some traffic. We’ll use ab (ApacheBench) from another machine.
ab -n 10000 -c 100 http://<your_server_ip>/
On the server, we’ll start tcpdump. A naive approach might be:
sudo tcpdump -i eth0 -w capture.pcap
If you run ab hard enough, you’ll likely see tcpdump report dropped packets, something like:
10000 packets captured
1500 packets dropped by kernel
This "dropped by kernel" message is key. It means the network card (NIC) received a packet, but before the kernel’s networking stack could even hand it off to tcpdump, the NIC’s internal buffer was full, and the packet was discarded at the hardware level. tcpdump itself is often not the bottleneck here; it’s the system’s ability to process packets fast enough.
Common Causes and Fixes for Dropped Packets
-
NIC Hardware Buffers are Too Small: The physical NIC has a limited amount of memory for incoming packets. If traffic floods this buffer, packets are dropped.
- Diagnosis: Use
ethtool -g <interface_name>(e.g.,ethtool -g eth0) to see the current buffer ring sizes (rx-usecs, rx-frames). Look for unusually small values. - Fix: Increase the buffer sizes. For example, to set the RX ring size to 4096 descriptors:
Why it works: This allocates more memory on the NIC to queue incoming packets, giving the kernel more time to process them before they are overwritten and lost. The exact maximum value depends on your NIC hardware.sudo ethtool -G eth0 rx 4096 - Persistence: This change is not persistent across reboots. You’ll need to add it to a network interface configuration script (e.g.,
/etc/network/interfacesor a systemd-networkd unit) or a startup script.
- Diagnosis: Use
-
Kernel Network Buffer Limits: Even if the NIC buffers are large, the kernel’s own internal buffers for network traffic can become a bottleneck.
- Diagnosis: Check current limits with
sysctl net.core.rmem_maxandsysctl net.core.netdev_max_backlog. - Fix: Increase these limits. For example:
Why it works:sudo sysctl -w net.core.rmem_max=16777216 # 16MB sudo sysctl -w net.core.netdev_max_backlog=5000net.core.rmem_maxsets the maximum receive socket buffer size, andnet.core.netdev_max_backlogcontrols the queue length for packets received from the network device driver before being processed by the kernel’s network stack. Increasing these allows the kernel to hold more packets temporarily. - Persistence: Add these to
/etc/sysctl.confto make them permanent.
- Diagnosis: Check current limits with
-
CPU Saturation: If the CPU is overloaded, it can’t process incoming network packets fast enough, leading to drops at the NIC or kernel level.
- Diagnosis: Monitor CPU usage with
toporhtop. Look for sustained high CPU utilization (e.g., >80%) across all cores, especially during the capture period. - Fix:
- Offload Tasks: If possible, move the capture or the source of traffic to a less loaded machine.
- Reduce Packet Rate: If you are generating the traffic, reduce the number of concurrent connections (
-c) or requests per second in your load generator. - Optimize Application: If the server application is CPU-bound, optimize it.
- Hardware Upgrade: In extreme cases, a faster CPU or more cores might be necessary.
- IRQ Affinity: (Advanced) Ensure network card interrupts (IRQs) are spread across multiple CPU cores. Use
cat /proc/interruptsto see IRQ assignments andsmp_affinityto adjust.
- Why it works: Less CPU load means more cycles are available for the kernel to handle network I/O.
- Diagnosis: Monitor CPU usage with
-
tcpdumpCapture Filter is Too Complex/Inefficient: Whiletcpdump’s BPF filter is generally efficient, extremely complex filters can still consume CPU resources.- Diagnosis: Compare capture rates with and without a filter. If
sudo tcpdump -i eth0 -w capture.pcap(no filter) drops fewer packets thansudo tcpdump -i eth0 'port 80' -w capture.pcap, the filter might be a factor. - Fix: Simplify the filter or, better yet, move the filtering to a more efficient point.
- Use
iptablesornftables: Create rules to drop unwanted traffic before it hits the network stack’s processing path fortcpdump. - Hardware Filtering: Some high-end NICs support packet filtering directly in hardware.
- Capture Everything, Filter Later: If filtering is the main issue, capture raw packets and use Wireshark or
tsharkon a more powerful machine to filter and analyze.
- Use
- Why it works: Offloading filtering or simplifying it reduces the CPU burden on the machine running
tcpdump.
- Diagnosis: Compare capture rates with and without a filter. If
-
tcpdumpPacket Buffering (Less Common for Drops, More for Latency):tcpdumpitself has internal buffers. If it can’t write to disk fast enough, it might drop packets.- Diagnosis: Monitor disk I/O on the capture machine using
iostat. If the disk is saturated (high%util, highawait), this could be a bottleneck for writing the capture file. - Fix:
- Faster Storage: Write to a faster disk (SSD, NVMe) or a different disk than the OS.
- Networked Capture: If writing to a remote NFS share, ensure the network and the remote storage are not bottlenecks.
- Increase
tcpdump’s buffer:tcpdump -B <buffer_size_in_KB>(e.g.,tcpdump -B 2097152for 2GB).
- Why it works: Larger buffers give
tcpdumpmore breathing room to write captured packets to disk without overflowing its internal queues.
- Diagnosis: Monitor disk I/O on the capture machine using
-
Driver Issues or NIC Hardware Faults: Outdated drivers or failing hardware can lead to erratic behavior, including packet drops.
- Diagnosis: Check
dmesgfor any NIC-related error messages. Update NIC drivers to the latest stable version. Test the NIC with a loopback test or by swapping it if possible. - Fix: Update drivers or replace faulty hardware.
- Why it works: Ensures the NIC and its software interface are functioning correctly.
- Diagnosis: Check
-
Interrupt Storms: Excessive network traffic can cause a very high rate of interrupts, overwhelming the CPU’s interrupt handling mechanisms.
- Diagnosis: Monitor interrupts using
vmstat 1(look atincolumn) orsar -I SUM 1. High interrupt rates correlate with high packet rates. - Fix:
- Reduce Traffic: The most direct solution.
- IRQ Affinity: As mentioned in CPU saturation, spreading IRQs across cores helps.
rx-usecsTuning: For some NICs,ethtool -C eth0 rx-usecs 0can disable timer-based coalescing, making interrupts more immediate but also more frequent. This is a trade-off: can reduce drops under heavy load but increases CPU overhead. Test carefully.
- Why it works: By managing how often the CPU is interrupted by the NIC, you can balance responsiveness with CPU load.
- Diagnosis: Monitor interrupts using
After implementing these fixes, you’ll likely encounter a different problem: understanding the sheer volume of data captured.