Valkey’s memory usage isn’t just a sum of its parts; it’s a dynamic interplay where the absence of a key can sometimes be more telling than its presence.
Let’s see Valkey in action, specifically how it handles memory for different data types and how we can poke around to understand it. Imagine a Valkey instance with a few keys:
# Setting up some data
valkey-cli> SET mykey "a_very_long_string_value_that_will_take_up_some_memory"
OK
valkey-cli> HSET myhash field1 "value1" field2 "another_value"
(integer) 2
valkey-cli> LPUSH mylist "item1" "item2" "item3"
(integer) 3
valkey-cli> ZADD myzset 1 "member1" 2 "member2"
(integer) 2
valkey-cli> SET my_expiring_key "some_data" EX 60
OK
Now, how do we even begin to understand where the memory is going? The INFO memory command is our primary tool.
valkey-cli> INFO memory
# Memory
used_memory:1234567
used_memory_human:117.75K
used_memory_rss:2345678
used_memory_rss_human:2.24M
used_memory_peak:9876543
used_memory_peak_human:9.42M
used_memory_lua:0
used_memory_lua_human:0B
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_keys:5
keys_allocated:8
This gives us a high-level overview. used_memory is what Valkey reports it’s using for its data structures, excluding fragmentation. used_memory_rss is the Resident Set Size, what the OS sees Valkey consuming, which includes fragmentation and other overhead. used_memory_peak is the highest used_memory reached. number_of_keys is the count of keys with an expiration time, and keys_allocated is the total number of keys.
But this doesn’t tell us which keys are the problem. For that, we need SCAN and VALKEY-CLI --bigkeys or VALKEY-CLI --memory.
The VALKEY-CLI --bigkeys command is a quick diagnostic. It scans a sample of keys and reports the largest ones by memory footprint.
valkey-cli --bigkeys
# Sample output:
# # keys in sample: 1000
# # longest string: 35
# # largest hash: 200 bytes
# # largest lmax: 1000
# # largest zmax: 500
# # largest set: 150
# # largest list: 1000
# # largest string: 50
This gives us a hint, but it’s a sample and might miss the actual largest keys if they are few in number. For a more precise analysis, especially when hunting down specific memory hogs, VALKEY-CLI --memory is invaluable.
valkey-cli --memory --bigkeys
This command will scan all keys and provide a detailed breakdown. It’s crucial to understand that this scan can be resource-intensive and should ideally be run during off-peak hours or on a read-replica.
The output of VALKEY-CLI --memory --bigkeys is structured to pinpoint the largest keys and their types:
# Valkey Memory Analysis Report
# ============================
# Total keys: 5
#
# Largest keys by memory footprint:
#
# 0.00 MB | 0.00% | # String keys: 2
# 0.00 MB | 0.00% | # Hash keys: 1
# 0.00 MB | 0.00% | # List keys: 1
# 0.00 MB | 0.00% | # Set keys: 0
# 0.00 MB | 0.00% | # Sorted Set keys: 1
#
# 0.00 MB | 0.00% | # Keys with expiration: 1
# 0.00 MB | 0.00% | # Keys without expiration: 4
#
# Top 10 largest keys:
#
# Key: mykey
# Type: String
# Memory: 50 bytes
#
# Key: myhash
# Type: Hash
# Memory: 120 bytes
# Field: field1, Value: "value1" (Size: 6 bytes)
# Field: field2, Value: "another_value" (Size: 13 bytes)
#
# Key: mylist
# Type: List
# Memory: 80 bytes
# Item 1: "item1" (Size: 5 bytes)
# Item 2: "item2" (Size: 5 bytes)
# Item 3: "item3" (Size: 5 bytes)
#
# Key: myzset
# Type: Sorted Set
# Memory: 150 bytes
# Member: "member1", Score: 1 (Size: 7 bytes)
# Member: "member2", Score: 2 (Size: 7 bytes)
#
# Key: my_expiring_key
# Type: String
# Memory: 40 bytes
#
# ============================
(Note: The memory sizes above are illustrative. Actual sizes depend on Valkey version, internal encoding, and string/element sizes.)
The fundamental mechanism at play here is Valkey’s internal memory allocation and data encoding. Strings, for example, can be stored as raw byte arrays or, for shorter strings, encoded using EMBSTR or INT encoding, which can be more memory-efficient. Hashes, lists, sets, and sorted sets use various internal data structures (like hash tables, linked lists, skip lists, ziplists, and quicklists) that have their own overhead. The VALKEY-CLI --memory tool probes these structures and reports their estimated memory footprint.
To truly understand the memory footprint of a complex data structure like a hash or a list, you often need to break it down. For instance, a list’s memory isn’t just the sum of the lengths of its string elements. It includes overhead for the list structure itself, and if the list is short and its elements are small, Valkey might use a ziplist encoding, which is very compact. As the list grows, it might convert to a quicklist, which is a linked list of ziplists, offering a balance between access speed and memory efficiency. The VALKEY-CLI --memory tool tries to reflect this by showing the memory used by the overall structure and, where applicable, by its individual elements or fields.
The most surprising thing about Valkey’s memory usage is how much of it can be attributed to overhead per element rather than the element’s data itself, especially in structures like lists or sets that are not using their most compact encodings. A tiny integer stored as a string in a list can consume significantly more memory than expected due to the string overhead and the list’s structural overhead, particularly if it’s not a single large string but many small ones.
If you’re seeing high memory usage for what appears to be a small number of keys, investigate the type of those keys and the size of their elements. For instance, a hash with thousands of fields, each holding a tiny string, can collectively consume far more memory than a single large string key. Similarly, a list with thousands of small elements can be a memory hog, especially if it’s not using ziplist encoding. The key is to identify the specific elements within a complex data type that are contributing most to its size.
Once you’ve identified large keys, the next step is often optimizing them. This might involve serializing multiple small values into a single larger string (e.g., using JSON or Protocol Buffers), or re-evaluating if all the data needs to be in Valkey at all.