strace on a Java JVM is usually about figuring out why the JVM itself, not your Java code, is misbehaving at the operating system level.

Common Causes and Fixes for strace Java JVM Syscall Trace Issues

This typically means the JVM is stuck or performing poorly because it’s waiting on or misbehaving with the operating system. It’s not your Java code’s fault, but the JVM’s interaction with the kernel.

  1. Excessive read/write calls to /dev/null or similar idle devices:

    • Diagnosis: Run strace -c -p <JVM_PID> and look for a disproportionately high number of read or write calls, especially if the file descriptor is associated with /dev/null or similar "null" devices. You might see millions of these calls.
    • Fix: This often indicates a JVM bug or a misconfiguration related to logging or internal buffering. Ensure your JVM is up-to-date. For older versions or specific issues, you might need to redirect standard output/error to a real file or /dev/null explicitly at the JVM launch: java -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log ... > /dev/null 2>&1.
    • Why it works: By redirecting these streams properly, you prevent the JVM from attempting to write to a non-existent or ignored output, thus eliminating the syscall overhead.
  2. High poll/epoll_wait activity with no events:

    • Diagnosis: Use strace -p <JVM_PID> -s 1024 and observe the output for frequent poll or epoll_wait calls that return immediately with no file descriptors ready. This means the JVM is constantly checking for network or I/O events that aren’t happening.
    • Fix: This can be caused by network configuration issues, busy-waiting in native code, or aggressive thread-polling. Often, it’s related to how the JVM manages its network sockets or internal event loops. Check network configuration (netstat -tulnp) for unexpected open ports or connections. Sometimes, updating the JVM or OS networking libraries can resolve underlying issues. In rare cases, tuning JVM thread pool sizes or GC behavior might indirectly influence this.
    • Why it works: Resolving underlying network issues or reducing unnecessary polling loops stops the JVM from repeatedly asking the kernel if anything is happening when nothing is.
  3. Excessive futex calls (Fast Userspace Mutex) with EAGAIN or ETIMEDOUT:

    • Diagnosis: strace -c -p <JVM_PID> will show a massive number of futex calls. If ETIMEDOUT is common, it suggests threads are waiting for locks that are held for too long or never released.
    • Fix: This points to contention within the JVM’s native threading or synchronization primitives. It’s often a symptom of a Java-level deadlock or extremely high lock contention in your application. Profile your Java code for lock contention using tools like jstack or VisualVM. Ensure you’re not holding locks across long-running operations. A JVM update might also contain fixes for native futex handling.
    • Why it works: By identifying and resolving the Java-level lock contention, you reduce the need for threads to wait on futex primitives, thus decreasing the syscall overhead and preventing timeouts.
  4. Frequent open/stat calls on non-existent or transient files:

    • Diagnosis: strace -c -p <JVM_PID> shows a high count of open or stat syscalls. If the filenames observed are temporary, dynamic, or seem out of place (e.g., related to JMX, temporary directories), this is a clue.
    • Fix: This can be caused by JMX being configured incorrectly, temporary file cleanup issues, or native libraries the JVM uses aggressively checking for file existence. Ensure temporary directory permissions are correct and that JMX is configured with explicit ports if possible. Check JVM startup flags for any -Djava.io.tmpdir issues.
    • Why it works: Ensuring correct temporary directory access and proper JMX configuration stops the JVM from repeatedly failing to open or stat files it expects to exist or be accessible.
  5. High mmap/munmap activity:

    • Diagnosis: strace -c -p <JVM_PID> shows a very high number of mmap (memory map) and munmap calls. This suggests the JVM is frequently allocating and deallocating large chunks of memory, possibly for native libraries or internal data structures.
    • Fix: This can be related to garbage collection behavior or native code that is very dynamic with its memory usage. Ensure you have adequate memory available and that the JVM’s heap size (-Xmx) and garbage collector are appropriately tuned for your workload. Sometimes, a specific JVM version has known issues with memory management that a patch or upgrade can fix.
    • Why it works: By tuning memory allocation and GC, you reduce the churn of memory regions, leading to fewer mmap/munmap calls.
  6. Repeated readlink("/proc/self/fd/...") calls:

    • Diagnosis: strace -p <JVM_PID> shows frequent readlink calls on file descriptors within /proc/self/fd/. This is often how the JVM (or native libraries) inspects open file handles. An excessive number might indicate it’s probing information it doesn’t need or is stuck in a loop.
    • Fix: This is usually an internal JVM behavior for introspection or debugging. Ensure you don’t have aggressive debugging flags enabled that might cause this. If it’s persistent and high, it might be a JVM bug. Check for known issues in your JVM version related to file descriptor management or JMX.
    • Why it works: Reducing unnecessary introspection or resolving a JVM bug that causes excessive file descriptor probing stops the readlink syscalls.

The next error you’ll likely hit after fixing these is related to application-level performance or resource exhaustion, as these strace issues often mask deeper problems.

Want structured learning?

Take the full Strace course →