TCP’s congestion window, CWND, isn’t just a buffer size; it’s the dynamically adjusted throttle that prevents the internet from grinding to a halt.
Imagine you’re sending a massive file. You don’t just blast it all at once, right? TCP does something similar, but it’s constantly probing how much data the network can handle before it starts dropping packets. This "probing" is managed by the congestion window (CWND).
Let’s see it in action. We’ll use netstat to peek at active TCP connections and their CWNDs.
netstat -tunp | grep ESTABLISHED
You’ll see lines like this:
tcp 0 0 192.168.1.100:54321 172.217.160.142:443 ESTABLISHED 12345/firefox
Now, to see the CWND, we need to dig into the kernel’s TCP statistics. On Linux, this is often found in /proc/net/tcp. However, that file doesn’t directly show CWND. A more practical way is to use ss with the tcp-info option.
ss -tni 'sport = :443' | grep cwnd
This might output something like:
cubic(min=10,now=25,fastopen=3,cong=1,wscale=7,rwnd=65535,limited_wnd=0)
Here, now=25 indicates the current congestion window is 25 segments. The size of a segment depends on the Maximum Segment Size (MSS), which is typically 1460 bytes for Ethernet. So, 25 segments * 1460 bytes/segment = 36,500 bytes.
The core problem TCP congestion control solves is the "congestion collapse" scenario. If every sender on the internet blasted data at full speed, routers would quickly run out of buffer space, drop packets, and the network would become unusable. TCP’s CWND is the distributed mechanism that prevents this.
TCP operates in two main phases for managing CWND: Slow Start and Congestion Avoidance.
Slow Start is how a new connection begins. The CWND starts small (typically 1 to 10 segments, depending on the OS and TCP algorithm) and doubles with every acknowledgment received. This is an exponential growth phase. If you get an ACK for data sent when CWND was 4, the next CWND will be 8. Then 16, then 32, and so on. This phase is aggressive but short-lived because it quickly reaches the network’s capacity.
Congestion Avoidance is what happens after Slow Start. The growth becomes additive. For every Round Trip Time (RTT) where all ACKs are received, the CWND increases by just one segment. So, if CWND is 30, it might become 31 after an RTT of successful ACKs. This linear increase is much more cautious.
When a packet is lost (detected by duplicate ACKs or a retransmission timeout), TCP assumes congestion. The CWND is then drastically reduced. A common reaction (especially with TCP Reno and its successors) is to halve the CWND and re-enter Slow Start. This is a critical part of how TCP backs off when it senses trouble.
The specific algorithm used for congestion control (e.g., Cubic, Reno, BBR) significantly influences how CWND grows and shrinks. Cubic is the default on many Linux systems and aims to be more aggressive in finding bandwidth on high-bandwidth, high-latency networks by using a cubic function for growth.
sysctl net.ipv4.tcp_congestion_control
This command will show you the active algorithm, e.g., net.ipv4.tcp_congestion_control = cubic.
The rwnd value in the ss -tni output refers to the receiver’s advertised window. This is the buffer space available on the receiving end of the connection. The actual amount of data TCP can send is limited by the minimum of the sender’s CWND and the receiver’s rwnd. This ensures that the receiver isn’t overwhelmed.
Here’s a detail that often trips people up: the CWND is measured in segments, not bytes, and a segment’s byte size is capped by the MSS. When you see now=25, it means the sender can have up to 25 MSS-sized segments "in flight" (sent but not yet acknowledged). If the MSS is 1460 bytes, that’s 25 * 1460 = 36,500 bytes. If the MSS is smaller (e.g., 536 bytes for paths with strict MTU discovery), 25 segments would be much less data.
The interaction between CWND and the receiver’s advertised window (rwnd) is key. If rwnd is very small, it can effectively cap the CWND, making the connection perform poorly even if the network itself has plenty of capacity. This is why tuning receiver buffers (net.core.rmem_max, net.ipv4.tcp_rmem) is sometimes necessary for high-throughput applications.
Understanding how CWND adjusts based on packet loss and acknowledgments is fundamental to diagnosing network performance issues, especially for high-throughput, long-distance connections.
Next, you’ll likely encounter the concept of "bufferbloat" and how it interacts with TCP’s congestion control mechanisms.