The Valkey Slow Log is a debugging tool that records commands that exceed a specified execution time, helping you identify and optimize performance bottlenecks.
Let’s see it in action. Imagine you’re running a Valkey instance and suspect certain commands are taking too long. You can configure the slow log threshold and then inspect its contents.
First, set the slow log threshold. This is done via SLOWLOG SET <savelimit> <idletime>. savelimit is the maximum number of entries to store, and idletime is the execution time in microseconds above which a command is logged. For instance, to log commands taking longer than 10 milliseconds (10000 microseconds) and keep the last 1000 entries:
redis-cli CONFIG SET slowlog-log-slower-than 10000
redis-cli CONFIG SET slowlog-max-len 1000
Now, let’s simulate a slow command. A KEYS * command on a large dataset can be slow. If it exceeds our 10ms threshold, it will appear in the slow log.
To view the slow log, use SLOWLOG GET <count>. count specifies how many entries to retrieve.
redis-cli SLOWLOG GET 5
The output will look something like this:
1) 1) (integer) 18 # Unique ID of the command
2) (integer) 1678886400 # Unix timestamp when the command was executed
3) (integer) 50 # Execution time in microseconds
4) 1) "keys" # The command itself
2) "*"
2) 1) (integer) 17 # ... and so on for other slow commands
2) (integer) 1678886395
3) (integer) 15
4) 1) "zrange"
2) "myzset"
3) "0"
4) "100"
This output tells you the command’s unique ID, when it ran, how long it took (in microseconds), and the command with its arguments.
The primary problem the Slow Log solves is the "mystery lag" – when your application experiences performance degradation but you can’t pinpoint which Valkey operation is the culprit. By default, Valkey might not log anything until an error occurs, leaving you blind to gradual performance degradation. The Slow Log provides visibility into these latent issues.
The core of the Slow Log mechanism is a circular buffer. When slowlog-max-len is reached, new entries overwrite the oldest ones. This ensures you always have the most recent slow commands. The slowlog-log-slower-than configuration is the filter; anything faster is discarded.
The mental model is simple: a vigilant guard (the Slow Log) watching every command that passes through the gate. If a command takes too long (exceeds slowlog-log-slower-than), the guard writes down its details (ID, timestamp, duration, command) in a logbook (the circular buffer). If the logbook fills up, the oldest entries are erased to make room for new ones.
Here’s how to interpret and act on common slow log entries:
-
KEYS *orSCANon large datasets:- Diagnosis: You’ll see
KEYS *orSCANwith a high execution time. - Cause: Iterating over the entire keyspace is an O(N) operation and can block the server, especially with millions of keys.
- Fix: Avoid
KEYS *in production. UseSCANwith a smallCOUNTand iterate in your application to collect keys. If you absolutely need all keys, schedule it during off-peak hours. - Why it works:
SCANis an O(1) operation per iteration, and by processing it incrementally, you avoid long-running blocking commands.
- Diagnosis: You’ll see
-
Complex Lua scripts:
- Diagnosis: A
EVALorEVALSHAcommand with a high duration. - Cause: The Lua script itself has inefficient logic, performs many Valkey operations, or is extremely CPU-intensive.
- Fix: Optimize the Lua script logic. Break down complex operations into smaller, sequential
EVALcalls or separate commands. Profile the script if possible. - Why it works: Reducing the computational complexity or the number of Valkey round trips within the script directly lowers execution time.
- Diagnosis: A
-
Large
ZRANGEorZREVRANGEoperations:- Diagnosis:
ZRANGE myzset 0 100000with high latency. - Cause: Retrieving a very large number of elements from a sorted set.
- Fix: Use
ZRANGEwith a smallercount(e.g.,ZRANGE myzset 0 100) and paginate your results in the application. Alternatively, if you need a range based on scores, useZRANGEBYSCOREwith appropriate limits. - Why it works: Fetching fewer elements at once reduces the I/O and processing overhead per call, allowing other commands to be served.
- Diagnosis:
-
SMEMBERSorHGETALLon very large sets/hashes:- Diagnosis:
SMEMBERS mysetorHGETALL myhashshowing high execution times. - Cause: Retrieving all members of a set or all fields/values of a hash when the collection is enormous.
- Fix: Similar to sorted sets, avoid fetching entire large collections. Use
SSCANorHSCANfor incremental retrieval. If you need a specific member, useSISMEMBERorHEXISTS. - Why it works:
SSCANandHSCANare cursor-based, allowing you to retrieve elements in batches without blocking the server for extended periods.
- Diagnosis:
-
High number of
INCRorHINCRBYoperations within a short time:- Diagnosis: Multiple
INCRcommands appearing in the slow log, even if individually fast, when aggregated they cause contention. - Cause: High write contention on a small number of keys, leading to increased latency due to internal locking mechanisms.
- Fix: Batch updates where possible (e.g., using
MSETfor multiple keys orHSETfor multiple fields within a hash), thoughINCRoperations are atomic and usually efficient. Consider if distributed counters or alternative data structures are more appropriate for extreme scales. - Why it works: While
INCRis atomic, a very high frequency of these operations on the same key can still saturate the network or CPU. Batching reduces the number of network round trips and potential contention points.
- Diagnosis: Multiple
-
Network latency combined with Valkey processing:
- Diagnosis: Commands that are typically fast show up in the slow log, and the
Execution timeis only slightly above the threshold. - Cause: The actual Valkey command execution was fast, but the round-trip time from the client to the server and back is significant. The slow log measures the server-side execution time.
- Fix: Ensure your Valkey client is located geographically close to your Valkey server. Optimize network infrastructure. Use Valkey pipelining to send multiple commands in one go, reducing the number of round trips.
- Why it works: Pipelining groups commands, sending them as a batch and receiving responses together, significantly reducing the overhead of individual network requests.
- Diagnosis: Commands that are typically fast show up in the slow log, and the
It’s crucial to remember that the slow log only captures server-side execution time. High client-side latency due to network issues or application processing might not always be reflected directly in the slow log’s execution time metric, although it contributes to the overall perceived slowness.
Once you’ve identified and fixed slow commands, remember to reset the slow log if you want a clean slate for future analysis using SLOWLOG RESET. The next problem you might encounter is understanding how to effectively use Valkey’s replication to scale read operations.