TCP Keepalive is a mechanism to detect when a connection has gone "stale" or "dead" without being explicitly closed, ensuring resources aren’t held up indefinitely.
Imagine you’re playing a game online, and your internet connection drops suddenly. Your game client doesn’t know this immediately. It might keep sending packets that will never arrive, and the server might keep waiting for responses that will never come. TCP Keepalive is designed to prevent this kind of silent, resource-draining failure. It’s not about keeping a connection alive; it’s about detecting if it’s already dead.
Let’s see it in action. We’ll set up a simple client and server, then simulate a "dead" connection by killing the server process without closing the TCP socket.
Server Side (Python)
import socket
import time
import os
HOST = '127.0.0.1'
PORT = 12345
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((HOST, PORT))
server_socket.listen(1)
print(f"Server listening on {HOST}:{PORT}")
conn, addr = server_socket.accept()
print(f"Connected by {addr}")
# Keep the connection open for a bit
try:
while True:
data = conn.recv(1024)
if not data:
print("Client disconnected gracefully.")
break
print(f"Received: {data.decode()}")
conn.sendall(b"Server got your message!")
except ConnectionResetError:
print("Connection reset by peer.")
except Exception as e:
print(f"An error occurred: {e}")
finally:
print("Server closing connection.")
conn.close()
server_socket.close()
Client Side (Python)
import socket
import time
HOST = '127.0.0.1'
PORT = 12345
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
client_socket.connect((HOST, PORT))
print(f"Connected to {HOST}:{PORT}")
# Enable TCP Keepalive on the client socket
client_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# On Linux, you can also tune these:
# client_socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) # Idle time before first probe (seconds)
# client_socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10) # Interval between probes (seconds)
# client_socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5) # Number of probes before disconnect
for i in range(5):
message = f"Hello from client {i+1}"
print(f"Sending: {message}")
client_socket.sendall(message.encode())
time.sleep(2)
data = client_socket.recv(1024)
print(f"Received: {data.decode()}")
time.sleep(5) # Wait a bit before next send
print("Client finished sending messages. Waiting for server response...")
# Keep the client running to demonstrate keepalive detecting a dead server
while True:
time.sleep(1)
except ConnectionRefusedError:
print("Connection refused. Is the server running?")
except Exception as e:
print(f"An error occurred: {e}")
finally:
print("Client closing socket.")
client_socket.close()
Now, run the server, then run the client. You’ll see the client send messages and get responses. After a few messages, go to your server terminal and kill the Python process (e.g., Ctrl+C).
Observe the client. It will continue to run for a while, seemingly unaware the server is gone. Eventually, it will raise a BrokenPipeError or ConnectionResetError when it tries to send data and the kernel, via TCP Keepalive probes, has determined the connection is dead. The exact timing depends on your OS’s keepalive settings.
The fundamental problem TCP Keepalive solves is the "silent failure." Without it, a TCP connection could remain in a ESTABLISHED state in the operating system’s tables on both ends, even if the network path is broken or one of the hosts has crashed. This ties up resources (memory, file descriptors) and can lead to applications waiting indefinitely for data that will never arrive.
Internally, TCP Keepalive works by having the operating system periodically send a small, "empty" TCP segment to the peer. This segment is designed to elicit a response. If the peer is alive and the network path is clear, it will respond with an acknowledgment (ACK). If the peer is dead or the path is broken, the probe segment will be lost. After a configured number of unacknowledged probes, the OS will consider the connection broken and notify the application, typically by returning an error on the next socket operation (like send or recv).
The key parameters you can tune are:
SO_KEEPALIVE: This boolean socket option enables the keepalive mechanism for a specific connection. If it’s 0, keepalives are off. If it’s 1, they are on.TCP_KEEPIDLE(Linux, macOS, some BSDs): The time (in seconds) the connection must be idle before the first keepalive probe is sent. A common default is 7200 seconds (2 hours).TCP_KEEPINTVL(Linux, macOS, some BSDs): The interval (in seconds) between subsequent keepalive probes if the previous ones go unanswered. A common default is 75 seconds.TCP_KEEPCNT(Linux, macOS, some BSDs): The number of unacknowledged keepalive probes that must be sent before the connection is considered dead and the application is notified. A common default is 9.
For example, on Linux, to make keepalives more aggressive and detect dead connections faster, you might set:
# Set idle time to 60 seconds
sudo sysctl -w net.ipv4.tcp_keepalive_time=60
# Set interval to 10 seconds
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=10
# Set probe count to 5
sudo sysctl -w net.ipv4.tcp_keepalive_probes=5
These are kernel-wide settings. To apply them per-socket, you’d use setsockopt in your application code as shown in the client example, which overrides the system defaults for that specific socket.
The most surprising thing about TCP Keepalive is that it doesn’t actually prevent a connection from going dead; it’s purely a detection mechanism. The probes are sent by the kernel, not the application, and they are usually sent only when the connection has been idle. If your application is actively sending data, it will naturally detect a dead peer when its send call fails. Keepalive is for those times when the application isn’t actively communicating and needs the OS to tell it if the other side has vanished.
Once TCP Keepalive has declared a connection dead, the next error your application will likely encounter is a BrokenPipeError or ConnectionResetError on its next attempt to write to the socket.