SSH keepalives prevent your SSH connection from being unceremoniously dropped by network intermediaries or by the SSH server/client itself due to inactivity.
You’re setting up a new server or tweaking an existing one, and you’ve noticed your SSH sessions are intermittently getting disconnected, often after a period of inactivity. This isn’t some deep system failure; it’s usually just a network device or the SSH daemon deciding that an idle connection is a waste of resources and politely (or not so politely) closing it. The fix is to have your SSH client periodically send a tiny, harmless packet to the server, just to say "I’m still here!"
Here’s the breakdown of common causes and their fixes:
Cause 1: Network State Timeouts (Firewalls/NAT Gateways)
-
Diagnosis: This is the most frequent culprit. Firewalls and Network Address Translation (NAT) gateways maintain state tables for active connections. If a connection shows no traffic for a certain period (often 5-30 minutes), the gateway purges its state entry for that connection. The next time your client or server tries to send data, the gateway no longer recognizes the connection and drops the packet, effectively killing the SSH session.
-
Fix: Configure your SSH client to send null packets (keepalives) at a regular interval. Edit your SSH client configuration file, typically
~/.ssh/configon Linux/macOS, or use the registry for PuTTY on Windows. Add these lines:ServerAliveInterval 60 ServerAliveCountMax 3 -
Why it works:
ServerAliveInterval 60tells your SSH client to send a "keepalive" message to the server every 60 seconds.ServerAliveCountMax 3means if the server doesn’t respond to 3 consecutive keepalives, the client will assume the connection is dead and exit gracefully. This regular traffic prevents the network devices from timing out the connection state.
Cause 2: SSH Server Idle Timeout
-
Diagnosis: The SSH server itself might be configured with an idle timeout. This is less common than network timeouts but can happen on hardened systems.
-
Fix: On the SSH server (e.g.,
/etc/ssh/sshd_config), setClientAliveIntervalandClientAliveCountMax:ClientAliveInterval 120 ClientAliveCountMax 5Then, reload the SSH service:
sudo systemctl reload sshd(orsudo service ssh reloadon older systems). -
Why it works:
ClientAliveInterval 120instructs the SSH server to send a keepalive message to the client every 120 seconds.ClientAliveCountMax 5means the server will consider the connection broken if it doesn’t receive a response after 5 such messages. This ensures the server also keeps the connection alive.
Cause 3: Inconsistent Client/Server Settings
- Diagnosis: You might have one setting configured (e.g., client-side
ServerAliveInterval) but not the other (server-sideClientAliveInterval), or they might be set to very different values, leading to confusion or premature disconnects if one side is more aggressive than the other. - Fix: Aim for consistency. If you set
ServerAliveInterval 60on the client, consider settingClientAliveInterval 120on the server. The client’s interval should generally be shorter. This ensures that if the client is pinging every minute, the server is also being pinged frequently enough not to time it out, and vice-versa. - Why it works: Synchronized or complementary keepalive intervals ensure that both ends of the connection are actively participating in maintaining its liveness, reducing the chance of a timeout from either side or any intermediary.
Cause 4: Network Congestion or Packet Loss
-
Diagnosis: Even with keepalives, severe packet loss or extreme network congestion can cause keepalive packets to be dropped, leading the client or server to believe the connection is dead. This is harder to diagnose directly from SSH logs but might be indicated by general network slowness or intermittent connectivity issues.
-
Fix: While you can’t fix the underlying network, you can adjust keepalive timings to be more resilient. Increase
ServerAliveInterval(e.g., to 180 or 300 seconds) and potentiallyServerAliveCountMax(e.g., to 10). This gives more breathing room for intermittent network issues.Client-side:
ServerAliveInterval 180 ServerAliveCountMax 10Server-side:
ClientAliveInterval 300 ClientAliveCountMax 10Reload SSH service after server-side changes.
-
Why it works: By increasing the interval between keepalives and the number of missed keepalives before declaring a connection dead, you make the SSH session more tolerant of transient network problems.
Cause 5: Client or Server Software Bugs/Configuration Errors
- Diagnosis: Though rare, a bug in the SSH client or server software, or a misconfiguration that isn’t a direct timeout setting, could cause premature disconnections. This might manifest as specific error messages in
/var/log/auth.log(or equivalent) on the server or verbose client output (ssh -vvv user@host). - Fix: Ensure your SSH client and server software are up-to-date. Check
sshd_configfor any unusual directives that might relate to connection management. If using a specific client like PuTTY, ensure its settings are correct. - Why it works: Updated software often includes bug fixes. Correcting specific configuration errors removes the faulty logic that’s prematurely terminating the connection.
Cause 6: Keepalive Packets Being Blocked
-
Diagnosis: Some very aggressive intrusion detection systems (IDS) or security appliances might flag repeated, small packets from a single source to a single destination as suspicious activity (e.g., a port scan).
-
Fix: If you suspect this, you can try slightly varying the keepalive interval or using different methods. For example, some clients support
TCPKeepAlive yes(which uses the TCP layer’s keepalives) in addition to or instead ofServerAliveInterval.Client-side:
TCPKeepAlive yes ServerAliveInterval 120 ServerAliveCountMax 5Note:
TCPKeepAliveis often enabled by default at the OS level, but explicitly setting it in~/.ssh/configensures it’s active for that SSH connection. -
Why it works:
TCPKeepAliverelies on the operating system’s TCP stack to send keepalive probes, which are handled differently than SSH’s application-levelServerAliveIntervalpackets. This variation might bypass IDS rules that specifically target SSH protocol traffic.
After implementing these, you should find your SSH sessions remain stable even through long periods of inactivity. The next error you might encounter, if you’ve truly fixed all timeout issues, is likely related to authentication or resource exhaustion on the server, rather than a broken pipe.