TCP Half-Open Connections: Detect and Clean Up (2026)

A TCP half-open connection is a connection where one side thinks it’s still connected, but the other side has already closed its end, leading to resource leaks and connection table exhaustion.

Let’s see this in action. Imagine a web server that gets slammed with requests and then some clients just disappear without properly closing their connections.

# On the server, observing connections
netstat -tunap | grep ESTABLISHED | grep <client_ip>

# After a while, some might show as CLOSE_WAIT
netstat -tunap | grep CLOSE_WAIT | grep <server_port>

Here, ESTABLISHED shows active connections. CLOSE_WAIT indicates the server is waiting for the local application to close the connection, but the remote end is already gone. The server’s side of the connection is stuck in this CLOSE_WAIT state, holding onto resources.

The core problem is that TCP is a stateful protocol, and each connection consumes memory and entries in the connection tracking table (like conntrack in Linux). When clients disappear without a FIN or RST packet, the server’s side of the connection lingers in a CLOSE_WAIT state indefinitely, or until a timeout eventually cleans it up. This can quickly exhaust the available connection table entries, preventing new legitimate connections from being established.

Common Causes and Fixes:

Application Bug: Not Closing Sockets: The most frequent culprit. The application on the server side accepts a connection but fails to call close() on the socket descriptor when it’s done or when the client disconnects unexpectedly.
- Diagnosis: Use strace -p <pid> -e trace=close on the suspected application process. If you see no close() calls for sockets handling client connections, or if the application is busy doing other things and not checking for client disconnects, this is likely it.
- Fix: Modify the application code to ensure close() is called on the socket descriptor after the request is served or when an error/disconnect is detected. For example, in C: close(client_socket_fd);.
- Why it works: Explicitly closing the socket signals the operating system to initiate the TCP teardown process, releasing the resources associated with that connection.
Client Crashes/Power Loss: Clients can crash, lose network connectivity, or have their power cut without sending a proper TCP FIN or RST packet.
- Diagnosis: This is harder to diagnose directly on the server without correlating with client-side events. However, if you see a sudden spike in CLOSE_WAIT states without any corresponding application errors, it suggests external factors.
- Fix: Rely on TCP keepalives and shorter idle timeouts. Configure TCP keepalives at the OS level:
```
# Example for Linux sysctl
sysctl -w net.ipv4.tcp_keepalive_time=600  # Send probes every 10 minutes
sysctl -w net.ipv4.tcp_keepalive_intvl=60 # Interval between probes
sysctl -w net.ipv4.tcp_keepalive_probes=5 # Number of probes before declaring dead
```
  And set shorter application-level or firewall idle timeouts. For example, on an Nginx server:
```
http {
    # ...
    keepalive_timeout 65; # Default is 75s
    # ...
}
```
- Why it works: TCP keepalives periodically send small probe packets to verify if the other end is still responsive. If no ACK is received after a certain number of probes, the OS assumes the connection is dead and forcefully closes it, transitioning the socket out of CLOSE_WAIT.
Network Devices (Firewalls/Load Balancers) Timing Out: Intermediate network devices might have their own connection tracking tables and idle timeouts that differ from the server’s. If a device times out a connection before the server or client does, it can leave one side in a half-open state.
- Diagnosis: Check the idle timeout settings on your firewalls and load balancers. If the server’s connection timeout is longer than a firewall’s, this can happen. Use tcpdump on the server to see if FIN/RST packets are arriving from the client or if the server is sending them.
- Fix: Synchronize idle timeouts across your network infrastructure. Ensure the server’s TCP timeouts (and application timeouts) are less than or equal to the timeouts on firewalls and load balancers. For stateful firewalls, ensure they are configured to send RST packets when their state entry expires.
- Why it works: Consistent timeouts prevent intermediate devices from dropping connections unilaterally, allowing the end hosts to manage the connection lifecycle gracefully.
SYN Flood Attacks: While not directly causing CLOSE_WAIT, SYN floods can exhaust the system’s ability to establish new connections by filling up the SYN backlog queue. This can indirectly lead to perceived connection issues where legitimate connections aren’t even established, making existing half-open connections seem more problematic.
- Diagnosis: Monitor /proc/net/netstat for SyncookiesSent and SyncookiesRecv. High SYN_RECV states in ss -s or netstat -s are also indicators.
- Fix: Enable SYN cookies and tune net.ipv4.tcp_max_syn_backlog and net.ipv4.tcp_synack_retries.
```
sysctl -w net.ipv4.tcp_syncookies=1
sysctl -w net.ipv4.tcp_max_syn_backlog=2048 # Example value
sysctl -w net.ipv4.tcp_synack_retries=3    # Example value
```
- Why it works: SYN cookies allow the server to acknowledge SYN packets without allocating significant resources until a valid ACK is received, preventing the backlog from being filled by malicious SYN packets.
System Resource Exhaustion (Memory/File Descriptors): If the server is critically low on memory or has hit its open file descriptor limit, it might fail to properly process socket events, leading to connections lingering.
- Diagnosis: Check dmesg for OOM killer messages. Use ulimit -n for the current process/user limit and cat /proc/<pid>/limits for a specific process. Monitor memory usage with free -h and top.
- Fix: Increase system memory, tune kernel parameters related to memory management, or increase the file descriptor limit for the application’s user or process. For file descriptors:
```
# For a specific process (e.g., PID 1234)
echo 65536 > /proc/1234/limits # This is not persistent, needs service configuration
# To make persistent, edit /etc/security/limits.conf
# * soft nofile 65536
# * hard nofile 65536
```
- Why it works: Ensuring the application has sufficient resources allows it to correctly manage its socket states and perform necessary cleanup operations.
Kernel Bugs or Misconfigurations: In rare cases, a bug in the TCP/IP stack implementation or an unusual kernel parameter setting can cause connections to stall.
- Diagnosis: Check kernel logs (dmesg, /var/log/kern.log) for any TCP-related errors or warnings. Compare your kernel version and configuration against known issues.
- Fix: Upgrade the kernel to a stable, supported version. Revert any non-standard or experimental kernel tuning parameters.
- Why it works: A stable, well-tested kernel ensures the fundamental networking stack operates as expected, handling connection state transitions reliably.

After fixing these issues, you might encounter TIME_WAIT states.

More Deep Dives in Tcp