The most surprising thing about Valkey’s rate limiting is that it often performs better when you’re hitting the limits.
Let’s watch it in action. Imagine a simple API endpoint that allows 10 requests per minute per user. We’ll use Valkey’s token_bucket rate limiter, which is built on atomic Lua scripts.
Here’s a simplified representation of the user-facing API:
import valkey
import time
# Assume Valkey is running on localhost:6379
client = valkey.StrictRedis(host='localhost', port=6379, db=0)
def get_user_id():
# In a real app, this would come from authentication
return "user:123"
def call_api_endpoint():
user_id = get_user_id()
rate_limiter_key = f"rate_limit:{user_id}"
max_tokens = 10
refill_rate = 10 # tokens per minute
# The Lua script is the core of the atomic operation
# It checks if tokens are available and decrements if so,
# or returns 0 if not.
lua_script = """
local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local current_time = redis.call('TIME')[1]
-- Get current tokens and last refill time, initialize if not exists
local token_data = redis.call('HGETALL', key)
local current_tokens, last_refill_time
if #token_data == 0 then
current_tokens = max_tokens
last_refill_time = current_time
else
current_tokens = tonumber(redis.call('HGET', key, 'tokens'))
last_refill_time = tonumber(redis.call('HGET', key, 'last_refill'))
end
-- Calculate tokens to add since last refill
local time_elapsed = current_time - last_refill_time
local tokens_to_add = math.floor(time_elapsed * (refill_rate / 60))
-- Update current tokens, capping at max_tokens
current_tokens = math.min(max_tokens, current_tokens + tokens_to_add)
-- Update last refill time
last_refill_time = current_time
-- Check if a token is available
if current_tokens >= 1 then
current_tokens = current_tokens - 1
redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', last_refill_time)
-- Set expiry to prevent stale keys, e.g., 2 minutes after last refill
redis.call('EXPIRE', key, 120)
return 1 -- Success, token consumed
else
-- Store updated token count and last refill time even if no token was consumed
redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', last_refill_time)
redis.call('EXPIRE', key, 120)
return 0 -- Failure, no token available
end
"""
script_sha = client.script_load(lua_script) # Load script once for efficiency
# Call the script using EVALSHA for performance
result = client.evalsha(script_sha, 1, rate_limiter_key, max_tokens, refill_rate)
if result == 1:
print(f"Request allowed. Tokens remaining: {client.hget(rate_limiter_key, 'tokens')}")
return True
else:
print("Rate limit exceeded.")
return False
# Simulate making requests
print("--- Making 12 requests in quick succession ---")
for i in range(12):
if call_api_endpoint():
pass # Do API work
else:
print(f"Request {i+1} blocked.")
time.sleep(0.1) # Small delay to simulate real-world timing
print("\n--- Waiting for refill ---")
time.sleep(60) # Wait for the minute to pass
print("\n--- Making another 10 requests ---")
for i in range(10):
if call_api_endpoint():
pass # Do API work
else:
print(f"Request {i+1} blocked.")
time.sleep(0.1)
This example shows how Valkey, by using Lua scripts, guarantees that checking for and consuming a token is an atomic operation. There’s no race condition where two requests could both see a token available, and then both try to consume it, leading to over-consumption. The entire check-and-decrement logic happens on the server, as a single, indivisible command.
The mental model here is a bucket that holds tokens. Tokens are added to the bucket at a fixed rate. When a request comes in, it tries to take a token from the bucket. If there’s a token, the request is allowed, and a token is removed. If the bucket is empty, the request is denied. The max_tokens parameter ensures the bucket doesn’t overflow, and the refill_rate determines how quickly it fills up.
The key levers you control are max_tokens and refill_rate. max_tokens defines the burst capacity – how many requests can be handled in a short, intense burst. refill_rate defines the sustained throughput – how many requests can be handled over a longer period. Choosing these values is a balance: too low, and you’ll block legitimate users; too high, and you risk overwhelming your downstream services.
What most people don’t realize is that the TIME command in Redis/Valkey returns seconds and microseconds. The Lua script leverages this to calculate fractional tokens added per second, which is more precise than just using integer seconds. This allows for a smoother refill across the minute, rather than a big chunk being added only at the top of the minute.
The next concept to explore is how to manage multiple rate limiters for different resources or users, and how to aggregate their usage.