TCP retransmissions aren’t always a sign of packet loss; often, they’re just your network being a bit slow to acknowledge data.

Let’s watch TCP in action. Imagine client 192.168.1.100 is downloading a file from server 10.0.0.50.

Client 192.168.1.100                                     Server 10.0.0.50
--------------------------------------------------------------------------------
SYN (Seq=x) -------------------------------------------->
                                                        SYN-ACK (Seq=y, Ack=x+1)
<--------------------------------------------------------
ACK (Ack=y+1) ------------------------------------------>
Data (Seq=x+1, Len=1460) ------------------------------>
                                                        ACK (Ack=x+1+1460)
<--------------------------------------------------------
Data (Seq=x+1461, Len=1460) ---------------------------->
                                                        ACK (Ack=x+1461+1460)
<--------------------------------------------------------

This dance is TCP’s reliable delivery mechanism. When the client sends Data, the server needs to acknowledge it. If the server doesn’t get that ACK back in time, it assumes the Data was lost and resends it. That’s a retransmission. But what if the ACK was sent, and the Data was received, but the ACK just got delayed? TCP doesn’t know; it just sees a missing ACK and acts defensively.

The core problem TCP solves is ensuring data arrives in order, without corruption, and without loss, over an unreliable network. It achieves this through sequence numbers, acknowledgments, and retransmissions. When you see retransmissions in Wireshark, you’re seeing TCP’s internal panic button being pressed. The key is to figure out why it’s panicking. Is the network truly dropping packets, or is it just a temporary congestion or a slow link causing delays that trigger TCP’s timeout?

Here’s how to dig into it. First, capture traffic between the client and server using Wireshark. A good starting filter is tcp.port == <your_port> (e.g., tcp.port == 80 for HTTP, or tcp.port == 443 for HTTPS). You can also filter by IP: ip.addr == 192.168.1.100 and ip.addr == 10.0.0.50.

Once you have the capture, look for "TCP Retransmission" in the Protocol column. This is your smoking gun.

Common Causes and How to Fix Them:

  1. Actual Packet Loss on the Network: This is the most straightforward, but often the hardest to fix. A router along the path is dropping packets due to congestion.

    • Diagnosis: Look for a high number of retransmissions for the same data. Also, check your TCP window size. If it’s consistently small and you’re seeing retransmissions, it might indicate congestion. Use ping -c 1000 -s 1400 <server_ip> and traceroute <server_ip> from the client to check for packet loss and latency. High packet loss rates on these tools point to network issues.
    • Fix: This requires network engineering. You might need to increase bandwidth, optimize routing, or implement Quality of Service (QoS) to prioritize TCP traffic. On the client or server, you can try to tune TCP parameters, but it’s a band-aid. For example, on Linux, you can adjust net.ipv4.tcp_rmem and net.ipv4.tcp_wmem (e.g., sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 6291456" and sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 6291456"), which allows TCP to use larger receive and send buffers, potentially tolerating more loss or latency.
    • Why it works: Larger buffers give TCP more room to store outgoing data or incoming acknowledgments, making it less likely to time out prematurely due to transient network issues.
  2. Congestion Window (CWND) Too Small: TCP uses a congestion window to limit the amount of unacknowledged data it sends. If this window is too small for the available bandwidth and latency, it can lead to retransmissions even without packet loss.

    • Diagnosis: In Wireshark, right-click on a TCP retransmission packet, go to Follow > TCP Stream. In the stream window, look at the Sequence number and Acknowledgment number fields. If the window size (Win value in the packet details) is consistently small (e.g., a few thousand bytes) while the bandwidth is high, this could be the culprit. The TCP Window Full or TCP ZeroWindow messages in Wireshark are strong indicators.
    • Fix: On the client or server, ensure TCP Auto-Tuning is enabled. For Linux, check net.ipv4.tcp_window_scaling (should be 1) and net.ipv4.tcp_congestion_control (e.g., cubic or bbr). If it’s disabled, enable it: sudo sysctl -w net.ipv4.tcp_window_scaling=1. If you’re on an older OS or have custom settings, you might need to manually increase TCP buffer sizes as mentioned in point 1.
    • Why it works: Window scaling allows TCP to advertise much larger window sizes, enabling it to send more data before waiting for an acknowledgment, thereby filling the network pipe more effectively.
  3. High Latency: If the network path has high Round Trip Time (RTT), the TCP retransmission timer might expire before the acknowledgment arrives, even if no packets were lost.

    • Diagnosis: Use ping -c 100 <server_ip> to measure latency. If the average RTT is consistently high (e.g., > 100ms), and you see retransmissions, this is a likely cause. Wireshark’s "TCP Retransmission" entries will often have a large time difference between the initial packet and the retransmission.
    • Fix: Reduce latency by choosing a more direct network path, upgrading link speeds, or moving servers closer. On the client/server, tuning TCP’s retransmission timeout (RTO) is complex and generally not recommended unless you have deep expertise. However, modern TCP algorithms like BBR can adapt better to high-latency, high-bandwidth links. On Linux, try sudo sysctl -w net.ipv4.tcp_congestion_control=bbr.
    • Why it works: BBR (Bottleneck Bandwidth and Round-trip propagation time) aims to discover the available bandwidth and minimum RTT, then paces sending to avoid overwhelming the network, making it more resilient to high latency.
  4. Firewall or NAT Device Stateful Inspection Issues: Some firewalls or NAT devices can interfere with TCP streams, especially if they have buggy implementations or are overloaded. They might drop packets, delay ACKs, or reset connections.

    • Diagnosis: Look for retransmissions that occur immediately after traffic passes through a known firewall or NAT device. Check the firewall logs for any dropped packets or connection errors related to the client/server IPs and ports. If the problem stops when you bypass the firewall (e.g., test from a machine on the same subnet), it’s a strong clue.
    • Fix: Update the firewall/NAT device firmware. If possible, reduce the stateful inspection load or temporarily disable certain features for testing. In some cases, you might need to replace the device.
    • Why it works: A correctly functioning firewall allows legitimate TCP traffic to pass through without interference. Fixing firmware or configuration ensures the device isn’t actively causing the problem.
  5. Application-Level Delays: Sometimes, the application on the receiving end is slow to process data and acknowledge it. This can cause the TCP receive buffer to fill up, leading to "ZeroWindow" advertised by the receiver.

    • Diagnosis: In Wireshark, filter for tcp.analysis.zero_window or tcp.analysis.zero_window_probe. This indicates the receiver advertised a window size of 0, meaning its receive buffer is full. If this persists, the sender will eventually retransmit. Check the application logs on the server.
    • Fix: Optimize the application on the receiving end to process data faster. Increase the application’s buffer sizes or thread pool if applicable. On the server, you can also increase TCP receive buffer sizes (net.ipv4.tcp_rmem on Linux, as mentioned in point 1) to give the application more breathing room.
    • Why it works: A larger receive buffer allows the server to accept more data from the network even if the application is slightly delayed in processing it, preventing the sender from timing out.
  6. Duplicated Packets: While less common for causing retransmissions directly, duplicated packets can confuse TCP state machines and, in rare cases, lead to spurious timeouts if the duplicate is misinterpreted.

    • Diagnosis: In Wireshark, look for identical packets with the same sequence number arriving very close together. This is often a symptom of network equipment issues.
    • Fix: This usually points to a faulty network interface card (NIC) or a misconfigured switch/router that’s duplicating packets. Replace suspect hardware or reconfigure network devices.
    • Why it works: Removing duplicated packets ensures the TCP receiver gets a clean, sequential stream of data, preventing confusion and potential retransmission triggers.

After addressing these, the next error you might encounter is a "TCP Keep-Alive Timeout," indicating that even with data flowing, the connection isn’t being fully utilized or is otherwise stale.

Want structured learning?

Take the full Wireshark course →