Valkey’s "Big Key Problem" isn’t about a single giant key causing memory issues; it’s about a cluster of keys that, while individually small, collectively consume an inordinate amount of memory and, more importantly, disproportionately impact cluster operations.
Imagine you have a Valkey cluster, humming along nicely. Then, someone starts writing data, but not just any data. They’re writing thousands, maybe millions, of keys that are all related to a single concept. Think of it like this: you have a users:123:posts key, then users:123:comments, users:123:likes, users:123:followers, and so on, for every single user. Individually, these keys are tiny. A single post ID, a comment ID, a follower ID – that’s not much. But when you multiply that by millions of users, and each user has dozens of these related keys, you’ve got a problem.
The real pain isn’t just the memory. It’s the operational cost. When Valkey needs to perform operations like SCAN, KEYS (which you should never use in production, but it illustrates the point), or even certain cluster rebalancing operations, it has to iterate through all these keys. If a large portion of your keys belong to a single "big key" set, these operations can grind your cluster to a halt. A SCAN that should take milliseconds might take minutes, or worse, block other critical operations. This is because Valkey, by default, treats each key as an independent entity. It doesn’t inherently understand that users:123:posts and users:123:comments are conceptually linked and might be accessed together.
Let’s dive into how to find these.
Detecting Big Keys
The most straightforward way is to use Valkey’s built-in SCAN command, but with a twist. We’re not just looking for keys by pattern; we’re looking for keys that belong to a conceptual group. The key here is to identify patterns that indicate this grouping.
Run this from a Valkey client connected to your cluster:
redis-cli --scan --pattern "users:*:posts" | wc -l
redis-cli --scan --pattern "users:*:comments" | wc -l
redis-cli --scan --pattern "users:*:likes" | wc -l
You’ll want to do this for all suspected patterns. If you see counts in the millions for multiple patterns that share a common prefix (like users:*), you’ve likely found a big key problem.
Why it works: The --scan --pattern combination tells redis-cli to iterate through keys matching a glob pattern. wc -l counts the lines, giving you the total number of keys matching that pattern. High counts across related patterns signal the issue.
Another powerful tool is redis-cli --bigkeys. This command scans all keys in a database and reports the count and memory usage of the largest keys by type. While it won’t directly show you the "big key problem" in terms of many small keys belonging to a conceptual group, it will reveal if any individual key is excessively large (e.g., a hash with thousands of fields, or a string that’s megabytes in size).
Run this:
redis-cli --bigkeys
Why it works: --bigkeys iterates through keys, categorizes them by type, and tracks the largest keys within each type. It’s essential for identifying single, monstrous keys that might also be impacting performance.
Breaking Up Big Keys
Once identified, the strategy is to "break up" the big key into smaller, more manageable units. This usually involves changing your data model.
Scenario 1: Many small keys related to a single entity (e.g., users:123:posts, users:123:comments)
The fix is to consolidate related data. Instead of thousands of individual keys per user, use a Valkey Hash or a Valkey List/Sorted Set.
Example Fix using Hashes:
Instead of:
SET users:123:posts:456 "Post data..."
SET users:123:comments:789 "Comment data..."
Use:
HMSET users:123 posts:456 "Post data..." comments:789 "Comment data..."
Or, if the values are complex and you want to keep them separate but grouped:
HMSET users:123 post_ids "456,789,1011"
SET users:123:posts:456 "Post data..."
SET users:123:comments:789 "Comment data..."
Why it works: A Hash stores multiple field-value pairs within a single Valkey key. This reduces the total number of keys in your cluster, making operations like SCAN or cluster rebalancing much faster because they only need to iterate over the users:123 key and its fields, rather than thousands of individual users:123:* keys.
Example Fix using Lists/Sorted Sets:
If you have a collection of IDs (like post IDs for a user) and want to retrieve them in order or with pagination:
Instead of:
SADD users:123:post_ids 456
SADD users:123:post_ids 789
SADD users:123:post_ids 1011
Use a Sorted Set:
ZADD users:123:post_ids 1000 456 (where 1000 is a timestamp or score)
ZADD users:123:post_ids 1001 789
ZADD users:123:post_ids 1002 1011
Then retrieve them with ZRANGE users:123:post_ids 0 10 WITHSCORES for pagination.
Why it works: A Sorted Set stores members with an associated score, allowing for ordered retrieval and efficient range queries. This consolidates potentially millions of individual ID keys into a single Sorted Set key, drastically reducing the key count and improving performance for queries involving collections of IDs.
Scenario 2: A single key with excessive data (e.g., a Hash with 1 million fields, a String of 10MB)
The fix here is to shard the data within that key.
Example Fix for a large Hash:
Instead of:
HMSET user:123 field1 "val1" field2 "val2" ... field1000000 "val1M"
Break it into multiple Hashes:
HMSET user:123:fields:1 field1 "val1" ... field1000 "val1000"
HMSET user:123:fields:2 field1001 "val1001" ... field2000 "val2000"
…and so on.
You’ll need a way to manage these sub-keys, perhaps by storing a list of the sub-keys in a separate key like SET user:123:subkeys "user:123:fields:1,user:123:fields:2,...".
Why it works: By splitting a single large Hash into multiple smaller Hashes, you distribute the fields across different Valkey keys. Operations that previously had to process all 1 million fields in one go can now operate on smaller chunks, improving responsiveness and reducing the impact of a single large key on Redis operations.
Example Fix for a large String:
If you’re storing large binary blobs or text directly in a String key, consider breaking them into chunks.
Instead of:
SET large_blob "..." (where "…" is 10MB)
Use:
SET large_blob:0 "chunk1"
SET large_blob:1 "chunk2"
…
SET large_blob:N "chunkN"
Why it works: Large individual String values can be problematic for memory allocation and network transfer. Splitting them into smaller, manageable String keys allows Valkey to handle them more efficiently and reduces the risk of a single large operation causing latency spikes.
The next hurdle you’ll likely face after cleaning up big keys is optimizing your access patterns to leverage these new, smaller data structures effectively, perhaps involving batching operations to minimize network round trips.