TCP Window Scaling is a mechanism that allows the TCP receive window size to be larger than 64KB, which is the original maximum defined in RFC 793. This is critical for achieving high throughput on modern networks, especially those with high bandwidth and long round-trip times (RTT).
Let’s see it in action. Imagine two servers, serverA (192.168.1.100) and serverB (192.168.1.200), connected by a link that simulates a high RTT and bandwidth. We’ll use iperf3 to test throughput.
First, on serverB (the receiver), start iperf3 in server mode:
iperf3 -s
On serverA (the sender), we’ll initiate a test. Without window scaling, the throughput will be severely limited.
iperf3 -c 192.168.1.200 -t 10 -P 1
If window scaling is not enabled or is ineffective, you might see a throughput of, say, 50 Mbps. Now, let’s enable window scaling on both systems.
On Linux, this is controlled by kernel parameters. You can check the current setting with:
sysctl net.ipv4.tcp_window_scaling
If it’s 0, it’s disabled. To enable it (and set a reasonable default maximum scale), you’d run:
sudo sysctl -w net.ipv4.tcp_window_scaling=1
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 6291456"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 6291456"
The tcp_window_scaling=1 enables the feature. The tcp_rmem and tcp_wmem parameters define the minimum, default, and maximum receive/send buffer sizes for TCP sockets, respectively. The last value, 6291456 (which is 6MB), is the effective maximum buffer size. When window scaling is enabled, the TCP header’s 16-bit window size field can be scaled up by a factor determined by the tcp_window_scaling option. With a scale factor of 7 (which is common and often the default when tcp_window_scaling=1), a 64KB window can effectively become 64KB * 2^7 = 8MB. This allows the sender to transmit much more data before needing an acknowledgment, keeping the network pipe full.
After ensuring window scaling is enabled on both serverA and serverB (and potentially any intermediate network devices that might interfere), re-run the iperf3 test:
iperf3 -c 192.168.1.200 -t 10 -P 1
You should now see a significantly higher throughput, potentially reaching close to the link’s capacity, e.g., 900 Mbps or more, depending on your network hardware.
The core problem window scaling solves is the "Bandwidth-Delay Product" (BDP). BDP is calculated as Bandwidth * Round-Trip Time. For example, a 1 Gbps link with a 100 ms RTT has a BDP of 1 Gbps * 0.1 s = 100 Megabits. This means that to keep the pipe full, you need to be able to buffer at least 100 Megabits of data in flight. The original 64KB TCP window is far too small for this. Window scaling, by allowing larger receive windows, directly addresses this by increasing the amount of data that can be in transit.
The TCP header has a 16-bit field for the receive window size, limiting it to 65,535 bytes. When TCP window scaling (RFC 1323/RFC 7323) is negotiated during the TCP handshake (using the WINDOW_SCALE option), a scale factor is agreed upon. This scale factor is then applied as a multiplier to the 16-bit window size. The sender shifts the window size value left by the scale factor. For example, if the scale factor is 7, a reported window of 65,535 bytes is effectively 65,535 * 2^7 bytes. The operating system’s TCP stack manages the actual buffer sizes (net.ipv4.tcp_rmem and net.ipv4.tcp_wmem on Linux) to match the scaled window, ensuring data can be buffered appropriately.
Most modern operating systems enable TCP window scaling by default. However, it can be disabled or misconfigured on older systems, or by network devices like firewalls or WAN optimizers that perform TCP stateful inspection and might not correctly handle or pass through the WINDOW_SCALE option during the handshake. If you enable it and see no improvement, the issue might be that the scale option is being dropped or that the receiver’s buffer sizes are still too small to take advantage of the scaled window.
The interaction between net.ipv4.tcp_rmem (or tcp_wmem) and the actual window size advertised can be subtle. While tcp_window_scaling=1 enables the protocol mechanism, the kernel’s buffer limits (tcp_rmem’s third value) dictate the maximum physical buffer capacity. If you have a very large BDP and the tcp_rmem values are set too low, you might still not achieve optimal throughput, even with window scaling enabled. The tcp_rmem values represent the range of buffer sizes the kernel will dynamically manage for a TCP connection. The third value is the maximum. For example, setting net.ipv4.tcp_rmem="4096 87380 16777216" would allow TCP receive buffers to grow up to 16MB.
The next thing you’ll likely encounter when optimizing high-bandwidth, high-latency links is the impact of TCP’s congestion control algorithms.