TCP High Connection Scaling: Tune for 1M+ Connections (2026)

The surprising truth about scaling to a million TCP connections is that it’s less about the sheer number of connections and more about how efficiently the kernel manages the state for each one.

Let’s see this in action. Imagine a simple Go program that just accepts connections and immediately closes them.

package main

import (
	"fmt"
	"net"
	"os"
)

func main() {
	port := "8080"
	listener, err := net.Listen("tcp", ":"+port)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error listening: %v\n", err)
		os.Exit(1)
	}
	defer listener.Close()
	fmt.Printf("Listening on port %s\n", port)

	for {
		conn, err := listener.Accept()
		if err != nil {
			fmt.Fprintf(os.Stderr, "Error accepting: %v\n", err)
			continue
		}
		go func(c net.Conn) {
			defer c.Close()
			// Immediately close the connection
		}(conn)
	}
}

If you run this on a typical Linux machine without any tuning, you’ll likely hit a wall well before a million connections. The operating system has a lot of work to do for each TCP connection: tracking its state (SYN_SENT, ESTABLISHED, CLOSE_WAIT, etc.), managing its send and receive buffers, and allocating memory for its associated data structures.

The core problem is that the kernel, by default, is conservative with its resources to ensure stability on a wide range of hardware. When you start pushing for massive concurrency, these defaults become bottlenecks.

Here’s how the kernel manages connection state:

struct sock: This is the primary kernel data structure representing a socket. Each TCP connection has one. It holds all the state, including IP addresses, ports, sequence numbers, window sizes, and pointers to other related data.
TCP Control Block (TCB): While not a distinct C struct in all kernels, it conceptually represents the TCP-specific state within struct sock. This includes congestion control parameters, retransmission timers, and other TCP options.
Network Buffers (sk_buff): For each connection, the kernel needs to manage memory for sending and receiving data. sk_buff structures are used to hold network packet data. High connection counts mean many small sk_buff allocations, which can lead to memory fragmentation and high overhead.
File Descriptors: Each active socket is represented by a file descriptor in user space. The ulimit -n setting controls the maximum number of file descriptors a process can have, and the system-wide fs.file-max limits the total number of open files.

To scale to 1M+ connections, you need to adjust several kernel parameters. These parameters control how much memory the kernel can use for network buffers, how many file descriptors are available, and how it handles ephemeral ports.

The primary levers you’ll pull are:

Ephemeral Port Range: When clients connect to a server, they use a source port that the server’s OS assigns dynamically. This range needs to be large enough to avoid running out of ports if your server is also acting as a client (e.g., for outbound connections or proxying).
- Diagnosis: Check the current range: sysctl net.ipv4.ip_local_port_range
- Fix: Expand the range. A common setting for high-scale servers is sysctl -w net.ipv4.ip_local_port_range="1024 65535". This gives you almost 64,000 ports to work with for outgoing connections.
- Why it works: This ensures that even if your server initiates many connections, it has a vast pool of available source ports to choose from, preventing "port exhaustion" errors on the client side of those connections.
TCP TIME_WAIT State: When a TCP connection closes, one side (typically the one that initiated the close with FIN) enters the TIME_WAIT state for a period (usually 60 seconds, 2*MSL). This is to ensure that any delayed packets from the previous incarnation of the connection don’t interfere with a new connection on the same socket pair. With millions of connections, you can have millions of sockets stuck in TIME_WAIT, consuming memory and port numbers.
- Diagnosis: Check current TIME_WAIT count: ss -tan | grep TIME-WAIT | wc -l
- Fix: Enable tcp_tw_reuse and tcp_tw_recycle (with caution for NAT). sysctl -w net.ipv4.tcp_tw_reuse=1 and sysctl -w net.ipv4.tcp_fin_timeout=30 (reducing the FIN timeout can also help, though tcp_tw_reuse is more direct). For tcp_tw_recycle, use with extreme care as it can break connections behind NAT.
- Why it works: tcp_tw_reuse=1 allows a new connection to be established if the local system is about to send a SYN packet and the remote end is in TIME_WAIT and the sequence numbers are consistent, effectively allowing the reuse of the socket before the full TIME_WAIT period expires. tcp_fin_timeout reduces how long sockets stay in FIN_WAIT_2.
Maximum Number of Open Files (File Descriptors): Each network connection consumes a file descriptor. You need to increase both the per-process limit and the system-wide limit.
- Diagnosis: Check per-process limit: ulimit -n. Check system-wide limit: sysctl fs.file-max.
- Fix: For the per-process limit, edit /etc/security/limits.conf and add lines like:
```
* soft nofile 1048576
* hard nofile 1048576
```
  Then, increase the system-wide limit: sysctl -w fs.file-max=2097152 (set it to at least twice your target connection count). You’ll need to restart services or reboot for limits.conf changes to take full effect, and sysctl changes are immediate but not persistent across reboots unless added to /etc/sysctl.conf.
- Why it works: This directly increases the number of concurrent connections your processes can open and the total number of file handles the kernel can manage across all processes.
TCP Buffer Sizes: Each connection has receive and send buffers. If these are too small, throughput suffers. If they are too large, memory usage per connection can become prohibitive at high scales. The kernel can dynamically tune these, but setting sensible minimums and maximums is important.
- Diagnosis: Check current values: sysctl net.core.rmem_max, sysctl net.core.wmem_max, sysctl net.ipv4.tcp_rmem, sysctl net.ipv4.tcp_wmem.
- Fix: Increase net.core.rmem_max and net.core.wmem_max to a reasonable large value, e.g., sysctl -w net.core.rmem_max=16777216 and sysctl -w net.core.wmem_max=16777216. Also, tune net.ipv4.tcp_rmem and net.ipv4.tcp_wmem to allow for larger buffers. A common setting is sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" and sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216". The three values are min, default, and max buffer size.
- Why it works: This allows TCP to use larger buffers, which can improve throughput over high-latency networks and reduce the number of packets needed to fill the pipe. The kernel’s auto-tuning mechanism will then use these larger maximums effectively.
TCP Congestion Control: While not directly a scaling limit, an efficient congestion control algorithm is crucial. For high-performance servers, tcp_congestion_control might be set to cubic (default) or bbr.
- Diagnosis: Check current algorithm: sysctl net.ipv4.tcp_congestion_control.
- Fix: If cubic is not performing well, consider bbr if your kernel supports it and your network has high bandwidth-delay product. sysctl -w net.ipv4.tcp_congestion_control=bbr.
- Why it works: bbr aims to improve throughput and reduce latency by directly measuring bandwidth and round-trip time, rather than relying solely on packet loss as a signal for congestion.
Network Queue Management (Backlog): The kernel maintains a queue for incoming connection requests (SYN queue) and a queue for established connections waiting to be processed. These backlogs need to be large enough to handle bursts of connections.
- Diagnosis: Check current backlog: sysctl net.core.somaxconn (max backlog for listening sockets) and sysctl net.ipv4.tcp_max_syn_backlog (SYN queue size).
- Fix: Increase net.core.somaxconn: sysctl -w net.core.somaxconn=4096. Increase net.ipv4.tcp_max_syn_backlog: sysctl -w net.ipv4.tcp_max_syn_backlog=2048. Note that somaxconn is also a parameter in your application’s listen() call, which must be at least as large as the kernel setting.
- Why it works: A larger backlog ensures that incoming connection requests, especially during a traffic spike, are not immediately dropped by the kernel before the application can accept them.

Tuning these parameters allows the kernel to efficiently manage the state and resources for a massive number of concurrent TCP connections.

After tuning, the next hurdle you’ll likely encounter is application-level processing or the limits of your CPU and memory to handle the actual work for each connection.

More Deep Dives in Tcp