NFS client requests can be dropped by the server’s network stack if the server is overloaded, leading to client-side timeouts and retries.

Common Causes and Fixes for NFS Timeouts

  1. NFS Server CPU Saturation: The NFS server’s CPU is too busy to process incoming requests promptly.

    • Diagnosis: On the NFS server, run top or htop and look for processes consuming high CPU, especially nfsd or rpc.statd.
    • Fix: Increase the number of nfsd threads. Edit /etc/nfs.conf (or /etc/sysconfig/nfs on older systems) and set nfsd.threads to a higher value (e.g., nfsd.threads = 128). Restart the NFS server service: systemctl restart nfs-server. This allows the server to handle more concurrent NFS requests.
    • Diagnosis: Check NFS server load using sar -u 1 5 for CPU utilization.
    • Fix: If CPU usage consistently exceeds 80%, consider offloading some NFS traffic to other servers or upgrading the server’s CPU.
  2. NFS Server Network Interface Saturation: The server’s network interface is overwhelmed with traffic.

    • Diagnosis: On the NFS server, use iftop -i <interface_name> (e.g., iftop -i eth0) to monitor network bandwidth usage per connection.
    • Fix: Increase the server’s network bandwidth. This might involve upgrading the network interface card (NIC) or the network switch port. Ensure the NIC is configured for maximum speed and duplex (e.g., ethtool <interface_name>).
    • Diagnosis: Check for dropped packets on the server’s interface using ip -s link show <interface_name>. Look for increases in the dropped or overrun counters.
    • Fix: If dropped packets are high, it indicates the interface is not keeping up. This points to either a hardware bottleneck or a driver issue. Ensure you are using a recent, stable NIC driver.
  3. NFS Server Memory Pressure: The server is swapping or experiencing high memory utilization, impacting nfsd performance.

    • Diagnosis: On the NFS server, run free -h or top and check the available memory and swap usage.
    • Fix: Increase the server’s RAM. If swapping is occurring, even a small amount, it can drastically slow down I/O operations.
    • Diagnosis: Monitor memory usage with sar -r 1 5.
    • Fix: Tune kernel parameters related to memory management, such as vm.dirty_ratio and vm.dirty_background_ratio, if memory is consistently high but not exhausted.
  4. NFS Server Disk I/O Bottleneck: The underlying storage on the NFS server cannot keep up with read/write requests.

    • Diagnosis: On the NFS server, use iostat -xz 1 5 to monitor disk utilization (%util), await times (await), and queue sizes (avgqu-sz).
    • Fix: Upgrade the server’s storage. This could mean moving from HDDs to SSDs, using faster SSDs, or implementing a RAID configuration that balances performance and redundancy.
    • Diagnosis: Check specific filesystem performance with iotop.
    • Fix: If using a network-attached storage (NAS) device, ensure its internal performance is adequate and its network connection to the NFS server is not a bottleneck.
  5. NFS Client Network Issues: Packet loss or high latency between the client and server.

    • Diagnosis: On the NFS client, use ping -c 100 <nfs_server_ip> to check for packet loss and latency.
    • Fix: Troubleshoot the network path. This might involve checking intermediate switches, routers, or firewall rules that could be introducing latency or dropping packets.
    • Diagnosis: Use mtr <nfs_server_ip> to identify specific hops with high latency or packet loss.
    • Fix: Ensure jumbo frames are consistently configured (or not configured) across the entire path if enabled, as mismatches are a common source of dropped packets.
  6. NFS Mount Options on the Client: Inappropriate or overly aggressive mount options can cause timeouts.

    • Diagnosis: Examine the client’s /etc/fstab or the output of mount | grep nfs for options like rsize, wsize, hard, intr, timeo, and retrans.
    • Fix: Experiment with different rsize and wsize values. For example, try rsize=32768,wsize=32768. Ensure hard mounts are used for reliability. If intr is used, consider removing it if it’s causing spurious timeouts, as it can interrupt operations that might not be safely interruptible. Adjust timeo (timeout) and retrans (retransmissions) if timeouts are too aggressive (e.g., timeo=140, retrans=3). These values are in tenths of a second.
    • Fix: Re-mount the filesystem with the new options: mount -o remount,rsize=32768,wsize=32768 /mnt/nfs_share.
  7. NFS Server Kernel/Module Issues: Bugs in the NFS server kernel module or related RPC services.

    • Diagnosis: Check system logs (/var/log/messages, dmesg) on the NFS server for any NFS-related errors or warnings.
    • Fix: Ensure the NFS server’s kernel and user-space utilities are updated to the latest stable versions for your distribution. Sometimes, specific kernel versions have known NFS performance issues.

The next error you might encounter after fixing these issues is an NFS server busy error (ESTALE or ENOSPC), indicating that while requests are reaching the server, the server is unable to complete the operation due to resource constraints or filesystem full conditions.


NFS traffic analysis in Wireshark is less about watching packets fly by and more about interpreting the conversation between client and server to understand why a particular operation took longer than expected, or why it failed entirely.

Let’s say you’re debugging a slow ls command on an NFS mount. You’d capture traffic on the client, filter for nfs, and then look for the LOOKUP and READDIR RPC calls.

Here’s what a typical READDIR sequence might look like, from the client’s perspective:

1. Client -> Server: NFS3 CALL: READDIR(dir_handle, offset=0, count=8192)
2. Server -> Client: NFS3 REPLY: READDIR(directory_entries...)

The crucial metric here is the time between the CALL and the REPLY for that specific RPC. In Wireshark, this is the "Time delta from previous displayed packet" column. If that delta is large for many READDIR calls, it means the server is slow to respond.

But what if the server doesn’t respond? This is where you see TCP retransmissions or UDP packet loss indicators in Wireshark.

Consider this captured sequence:

Client IP.51234 -> Server IP.2049: UDP: NFS3 CALL: READDIR(dir_handle, offset=0, count=8192)
[TCP Retransmission] or [UDP: re-transmission] (if UDP checksums are off and it's being retransmitted by a lower layer, or if the client is using TCP for NFSv3 which is uncommon but possible)
... (many seconds later) ...
Client IP.51234 -> Server IP.2049: UDP: NFS3 CALL: READDIR(dir_handle, offset=0, count=8192)
Server IP.2049 -> Client IP.51234: UDP: NFS3 REPLY: READDIR(directory_entries...)

The gap between the initial CALL and the eventual REPLY (or a retransmission of the CALL) is the problem. Wireshark will show a large "Time delta" for the first READDIR CALL if the server took a long time to send its reply. If the packet is lost entirely, you might see the client send the same CALL again, and Wireshark will show the delta between the two identical calls.

To understand why that delta is large, you need to correlate this with server-side metrics.

Mental Model: The NFS Conversation

  1. Client Initiates: The client needs to perform an operation (read a file, list a directory, create a file). It packages this as an NFS Remote Procedure Call (RPC).
  2. Network Transit: The RPC request travels over the network. This is where latency, packet loss, and congestion become factors.
  3. Server Receives & Processes: The NFS server daemon (nfsd) receives the RPC. It then has to:
    • Parse the request: Understand what the client wants.
    • Check permissions/ACLs: Ensure the client is allowed to do this.
    • Interact with the filesystem: Read from disk, write to disk, create inodes, etc. This is often the slowest part.
    • Generate a reply: Package the result of the operation.
  4. Network Transit (Return): The RPC reply travels back to the client.
  5. Client Receives & Processes: The client receives the reply and completes the operation (e.g., displays file contents, updates directory listing).

Key Wireshark Fields for NFS Analysis:

  • nfs.rpc.stat: For NFSv3, 0 is SUCCESS, 1 is PROGUNAVAIL, 2 is PROGNOTREGISTERED, 5 is GARBAGE_ARGS, 6 is SYSTEM_ERR. For NFSv4, look at the nfs4.status field.
  • nfs.fh: The file handle. Useful for tracking operations on the same file.
  • nfs.stateid: For NFSv4, crucial for tracking the state of a file or directory.
  • nfs.read.count, nfs.write.count: The size of data being read or written. Large values here on a slow connection will increase the Time delta.
  • nfs.dir.offset, nfs.dir.count: For READDIR calls, these indicate how much of the directory has been read and how much the client is requesting.
  • tcp.analysis.retransmission, udp.analysis.retransmission: If you see these, packets are being lost somewhere between client and server.
  • frame.time_delta: The time since the previous packet. When this is large for an NFS CALL packet, it means the server took a long time to respond.

The most surprising thing about NFS performance debugging is how often the problem isn’t in the NFS protocol itself, but in the underlying network or the server’s ability to interact with its storage. An NFS READDIR call might look simple, but if it triggers a disk seek on a slow HDD, or if the network fabric between client and server has a single congested link, that one READDIR can take seconds. You’re not just looking at NFS packets; you’re looking at all packets between client and server and correlating them with server-side load and I/O.

The next step in understanding this is to delve into specific NFS operations like getattr, setattr, and write, and how their performance characteristics differ and what they reveal about server load and network conditions.

Want structured learning?

Take the full Wireshark course →