The TCP stack is silently dropping packets, causing your application to hang, and you’re seeing "Zero Window" or "RST" messages in Wireshark.
Zero Window
This isn’t a packet drop in the traditional sense, but rather a sender stopping because the receiver’s buffer is full.
Cause 1: Receiver Overwhelmed The most common reason is that the application on the receiving end can’t process data as fast as it’s arriving. The TCP stack on the receiver advertises a "window size" of 0 to tell the sender to pause.
- Diagnosis: Filter for
tcp.analysis.zero_windowandtcp.window_size == 0. Look at the sequence numbers. You’ll see a sender sending data, then a receiver acknowledging it, and then the receiver sending a packet withWindow size value: 0. The sender will then stop sending new data until it receives a non-zero window advertisement. - Fix: Increase the TCP receive buffer size on the receiving machine. On Linux, this is
net.core.rmem_maxandnet.ipv4.tcp_rmem. For example, to set the maximum receive buffer to 16MB:
This allows the receiver’s TCP stack to buffer more data before the application needs to consume it.sudo sysctl -w net.core.rmem_max=16777216 sudo sysctl -w net.ipv4.tcp_rmem='4096 87380 16777216' - Why it works: A larger buffer means the receiver can accept more data from the sender even if the application is momentarily slow, preventing the window from shrinking to zero.
Cause 2: Application Not Reading Data
The TCP receiver is capable of accepting data, but the application bound to that socket isn’t calling recv() or read() to pull data out of the kernel buffer. This leads to the buffer filling up and the TCP stack advertising a zero window.
- Diagnosis: Use
netstat -tunp(Linux) orResource Monitor(Windows) on the receiving host to see which process is associated with the listening port. Then, usestrace -p <PID>(Linux) orProcess Explorer(Windows) to see if that process is actively making read calls. If you see noread()orrecv()calls for a sustained period while data is arriving, this is your culprit. - Fix: Optimize the application code to read data from the socket more frequently. This might involve adjusting polling intervals, using asynchronous I/O, or breaking down large processing tasks.
- Why it works: By consuming data from the buffer, the application frees up space, allowing the TCP stack to advertise a non-zero window and resume data flow.
Cause 3: Network Congestion (Less Direct) While not a direct cause of Zero Window itself, severe network congestion can lead to it indirectly. If packets are being dropped or delayed significantly by intermediate network devices, the acknowledgments (ACKs) from the receiver might be delayed. This delay can cause the sender to assume the connection is stalled, and the receiver might also have its buffers filling up due to retransmissions and delayed ACKs.
- Diagnosis: Look for high retransmission rates (
tcp.analysis.retransmission) or duplicate ACKs (tcp.analysis.duplicate_ack) in Wireshark for the affected connection. Check network device statistics for dropped packets or buffer overflows. - Fix: Address network congestion. This could involve increasing bandwidth, optimizing routing, or implementing Quality of Service (QoS) to prioritize TCP traffic.
- Why it works: Reducing packet loss and latency ensures timely ACKs, which in turn keeps the TCP window open and allows for sustained data transfer.
RST (Reset)
A TCP RST packet is an abrupt termination of a connection. It’s like slamming the phone down. It can be sent by either the sender or receiver.
Cause 1: Connection to Unknown Port The most frequent cause of RST is a client trying to connect to a service on a server that isn’t listening on that port. The server’s TCP/IP stack receives the SYN packet, checks its listening ports, finds nothing, and sends back an RST.
- Diagnosis: Filter for
tcp.flags.reset == 1. Look at the destination port of the incoming SYN packet and the source port of the RST packet. If the destination port is, for example,8080and no service is running there, you’ll see an RST. - Fix: Ensure the correct application is running and listening on the expected port on the server. For example, if you’re trying to reach a web server on port 80, confirm the web server process is active and bound to
0.0.0.0:80or:::80. - Why it works: The RST is sent because there’s no application ready to accept the connection on that specific port. Starting the correct service makes a listener available.
Cause 2: Application Abruptly Closed Connection An application explicitly tells its operating system to close a connection, and the OS sends an RST. This is often done when an application encounters an unrecoverable error or security issue.
- Diagnosis: In Wireshark, find the RST packet. Look at the source IP and port. Then, on that host, use process monitoring tools (
strace -p <PID>on Linux,Process Exploreron Windows) to see if the application associated with that socket made aclose()orshutdown()call that resulted in an RST. Often, application logs will indicate why it decided to abort. - Fix: Debug the application. Identify the error condition that’s causing it to send an RST and fix the underlying bug.
- Why it works: The RST is a symptom; the root cause is the application’s error handling. Fixing the error condition prevents the application from initiating the RST.
Cause 3: Firewall Blocking Connection A stateful firewall might inject an RST if it detects traffic that violates its rules, or if it believes the connection is no longer valid (e.g., after a timeout).
- Diagnosis: If the RST packet originates from a network device (check the TTL and hop count, or traceroute), it’s likely a firewall. Look for RSTs arriving immediately after a SYN or during a period of otherwise normal traffic.
- Fix: Reconfigure the firewall to allow the traffic. This might involve opening the specific port, allowing the source/destination IPs, or adjusting stateful inspection rules.
- Why it works: The firewall stops interfering with or actively terminating the connection, allowing the legitimate TCP handshake or data transfer to proceed.
Cause 4: Duplicate SYN or Out-of-Order SYN If a client sends a SYN packet, and it gets lost or significantly delayed, the client might resend it. If the first SYN eventually arrives after the connection has been established and data has been exchanged, the server might send an RST in response to the duplicate SYN, as it doesn’t expect a new connection attempt on an established connection’s sequence numbers.
- Diagnosis: Filter for
tcp.flags.syn == 1andtcp.flags.reset == 1. Look for a SYN packet followed shortly by an RST packet, where the RST’s sequence number is close to the SYN’s sequence number. The server’s state machine gets confused. - Fix: Ensure reliable network delivery for the initial SYN packet. This usually means fixing underlying network issues causing packet loss or extreme latency.
- Why it works: By ensuring the initial SYN packet arrives promptly and reliably, the server never receives a duplicate or out-of-order SYN, thus avoiding the RST.
Retransmit
A retransmission occurs when the sender doesn’t receive an acknowledgment (ACK) for a segment of data within a certain time frame and sends the data again.
Cause 1: Packet Loss This is the most straightforward cause: the data packet (or its corresponding ACK) was lost somewhere in transit.
- Diagnosis: Filter for
tcp.analysis.retransmission. Observe the sequence numbers. You’ll see a segment sent, then a gap in the sequence numbers, followed by the exact same segment being sent again. If you also seetcp.analysis.duplicate_ackbefore the retransmission, it means the receiver got the segment but the ACK was lost, or the receiver got subsequent segments and is acknowledging them to try and spur retransmission of the missing one. - Fix: Improve network reliability. This could involve checking physical network cables, diagnosing faulty network interface cards (NICs), or addressing congestion on routers and switches.
- Why it works: Once the underlying packet loss is resolved, ACKs are received reliably, and retransmissions stop.
Cause 2: Delayed ACKs TCP employs delayed ACKs to improve efficiency. Instead of sending an ACK for every single data packet received, the receiver might wait for a short period (typically up to 200ms) or until it receives another data packet to bundle ACKs together. If the sender’s retransmission timer (RTO) expires before the delayed ACK arrives, a retransmission will occur.
- Diagnosis: Filter for
tcp.analysis.retransmissionandtcp.analysis.ack_delayed. You’ll see a data packet, then a period where no ACK is seen by the sender, followed by a retransmission. If you then see an ACK for the original segment arriving shortly after the retransmission, it was likely a delayed ACK issue. - Fix: Adjust the TCP delayed ACK setting on the receiving host. On Linux, this is controlled by
net.ipv4.tcp_delack_segandnet.ipv4.tcp_delack_mode. A common tweak is to reduce the ACK timeout:
Alternatively, for some applications, disabling delayed ACKs might be an option, though this can reduce throughput:sudo sysctl -w net.ipv4.tcp_delack_timer=50 # Set to 50ms (default is often 200ms)sudo sysctl -w net.ipv4.tcp_delack_flag=0 - Why it works: By reducing the delay before sending ACKs, or by disabling them, the sender receives acknowledgments faster, preventing its retransmission timer from expiring prematurely.
Cause 3: Network Latency Exceeding Retransmission Timeout (RTO) If the round-trip time (RTT) between the sender and receiver consistently exceeds the sender’s calculated RTO, the sender will retransmit even if no packets were actually lost. The RTO is dynamically calculated and can increase during periods of packet loss or high latency.
- Diagnosis: Look at the RTT values in Wireshark (often displayed when hovering over packets or available in expert info). Compare this to the calculated RTO. If RTT is consistently close to or exceeding RTO, this is the cause.
- Fix: Reduce network latency. This might involve optimizing routing, upgrading network links, or moving servers closer to clients. If latency is inherent to the application’s design (e.g., satellite links), consider using TCP optimizations like those found in some commercial TCP stacks or protocols like QUIC that are less sensitive to latency.
- Why it works: A lower, more stable RTT ensures that acknowledgments arrive before the sender’s RTO expires, preventing spurious retransmissions.
The next common issue you’ll encounter after fixing these is often related to the application layer, like TCP Connection Reset by Peer or Application Layer Protocol Errors.