The TCP kernel parameters aren’t just knobs to tweak; they’re the fundamental rules that govern how your machine talks on the network, and most of them are surprisingly conservative by default.

Let’s see this in action. Imagine a web server under load. We can observe its TCP state and buffer usage directly.

# Watch TCP connections, ESTABLISHED state, and buffer usage
watch -n 1 "ss -s && ss -tano state established"

This ss command gives us a real-time snapshot: ss -s shows a summary of socket statistics, including total sockets and memory usage. ss -tano state established lists all established TCP connections, showing their process owner (-a), network addresses (-n), TCP states (-t), and socket information (-o). You’ll see the number of connections grow, and importantly, the rcv-buf (receive buffer) and snd-buf (send buffer) usage. As traffic increases, these buffers fill up. If they fill too often, packets get dropped, and performance tanks.

The core problem these parameters solve is managing the inherent unreliability of packet-switched networks. TCP’s job is to provide a reliable, ordered stream of data over this shaky foundation. It does this through acknowledgments, retransmissions, and flow control. Kernel parameters directly influence the aggressiveness and capacity of these mechanisms.

Here’s how it works internally:

  1. Congestion Control: When the network is congested, TCP slows down to avoid overwhelming routers. Algorithms like Cubic (default on Linux) try to estimate congestion and adjust the sending rate.
  2. Flow Control: The receiver tells the sender how much data it can accept. This is managed by receive buffers. If the receiver can’t read data from the buffer fast enough, it advertises a smaller window, slowing the sender down.
  3. Retransmission Timers: If an acknowledgment isn’t received within a certain time, TCP assumes the packet was lost and retransmits it. The timer’s accuracy is crucial.

Tuning these parameters can dramatically improve throughput and reduce latency, especially on high-bandwidth, high-latency links (often called "long fat networks" or LFNs).

Tuning net.ipv4.tcp_rmem and net.ipv4.tcp_wmem

These two parameters define the range of memory (in bytes) allocated for TCP receive and send buffers, respectively. They are specified as three values: min default max.

  • Diagnosis: If you see high packet retransmissions (netstat -s | grep -i retrans) or your ss -s output shows high memory usage for sockets, especially if ss -tano shows many connections with rcv-buf or snd-buf close to their limits.

  • Common Cause 1: Insufficient Buffer Sizes. Default values are often too small for modern networks.

    • Diagnosis Command: sysctl net.ipv4.tcp_rmem net.ipv4.tcp_wmem
    • Example Default: net.ipv4.tcp_rmem = 4096 87380 6291456
    • Fix: Increase the max values. For a 10Gbps link with 100ms latency, a common rule of thumb for the bandwidth-delay product (BDP) is 10Gbps * 0.1s = 100MB. You’d want your buffers to accommodate at least this.
      sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
      sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 16777216"
      
    • Why it works: Larger buffers allow the sender to keep more data "in flight" without waiting for acknowledgments, maximizing the utilization of high-bandwidth, high-latency links. The max value sets the upper limit the kernel can dynamically allocate for individual TCP sockets.
  • Common Cause 2: Default values not aggressive enough for high-performance servers. The default value might be too low, preventing sockets from quickly scaling up their buffers.

    • Diagnosis: Observe ss -tano and notice rcv-buf and snd-buf staying relatively low even under load.
    • Fix: Increase the default value.
      sudo sysctl -w net.ipv4.tcp_rmem="4096 131072 16777216"
      sudo sysctl -w net.ipv4.tcp_wmem="4096 131072 16777216"
      
    • Why it works: A higher default allows sockets to start with more buffer space, reducing the time spent in smaller windows and accelerating the ramp-up to maximum throughput.
  • Common Cause 3: Aggressive tcp_mem limits. net.ipv4.tcp_mem sets limits on the total TCP memory usage across the system. If these are too low, the kernel might start dropping packets or refusing connections even if individual socket buffers aren’t full.

    • Diagnosis: sysctl net.ipv4.tcp_mem. If the current usage is frequently hitting the high watermark.
    • Fix: Increase the high watermark. These are in pages (4096 bytes). For a system with 64GB RAM, you might set it higher.
      # Example: Set high watermark to ~32GB (8388608 pages)
      sudo sysctl -w net.ipv4.tcp_mem="5767168 7690137 8388608"
      
    • Why it works: This provides more headroom for the kernel to manage the aggregate memory demands of all TCP connections, preventing system-wide memory pressure from impacting TCP performance.

Tuning net.ipv4.tcp_congestion_control

This parameter selects the congestion control algorithm. Cubic is the default and generally good, but others might be better for specific scenarios.

  • Diagnosis: If you’re experiencing poor performance on LFNs, or if your application is highly sensitive to latency spikes.
  • Common Cause 1: Cubic not optimal for extremely high BDP. Cubic can be slow to ramp up on very large bandwidth-delay product networks.
    • Diagnosis Command: sysctl net.ipv4.tcp_congestion_control
    • Fix: Try bbr (Bottleneck Bandwidth and Round-trip propagation time).
      # Check if BBR is available
      sudo sysctl net.ipv4.tcp_available_congestion_control
      # Load BBR if not loaded
      sudo modprobe tcp_bbr
      # Set BBR as the congestion control algorithm
      sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
      
    • Why it works: BBR aims to directly measure the bottleneck bandwidth and round-trip time, rather than inferring congestion from packet loss like Cubic. This can lead to faster ramp-up and more stable throughput on high-BDP links.

Tuning net.ipv4.tcp_fastopen

TCP Fast Open (T4) allows data to be sent in the initial SYN packet, skipping a full round trip for the data.

  • Diagnosis: High latency for the first request to a server from a client.
  • Common Cause 1: Fast Open disabled.
    • Diagnosis Command: sysctl net.ipv4.tcp_fastopen
    • Fix: Enable it. 1 means client-side, 2 means server-side, 3 means both.
      # Enable client and server Fast Open
      sudo sysctl -w net.ipv4.tcp_fastopen=3
      
    • Why it works: By sending data along with the SYN, it reduces latency for the initial data transfer, making initial connections feel much faster.

Tuning net.ipv4.tcp_synack_retries and net.ipv4.tcp_retries2

These control how many times TCP will retransmit SYN-ACKs and other data packets.

  • Diagnosis: Clients reporting "connection timed out" or "connection refused" sporadically, especially during network instability or high load.
  • Common Cause 1: Too few retries. The default is often too low, especially on unstable networks or with high latency.
    • Diagnosis Commands: sysctl net.ipv4.tcp_synack_retries and sysctl net.ipv4.tcp_retries2
    • Example Default: net.ipv4.tcp_synack_retries = 5, net.ipv4.tcp_retries2 = 15
    • Fix: Increase them.
      sudo sysctl -w net.ipv4.tcp_synack_retries=8
      sudo sysctl -w net.ipv4.tcp_retries2=30
      
    • Why it works: More retries give packets more chances to arrive, especially over lossy or high-latency links, reducing spurious connection failures. tcp_synack_retries is for the initial handshake, while tcp_retries2 is for established connections.

The next error you’ll likely encounter after tuning these parameters is a sudden increase in available network bandwidth, potentially revealing bottlenecks elsewhere in your application or infrastructure that were previously masked by TCP limitations.

Want structured learning?

Take the full Tcp course →