Valkey Rate Limiting: Token Bucket with Atomic Scripts (2026)

The most surprising thing about Valkey’s rate limiting is that it often performs better when you’re hitting the limits.

Let’s watch it in action. Imagine a simple API endpoint that allows 10 requests per minute per user. We’ll use Valkey’s token_bucket rate limiter, which is built on atomic Lua scripts.

Here’s a simplified representation of the user-facing API:

import valkey
import time

# Assume Valkey is running on localhost:6379
client = valkey.StrictRedis(host='localhost', port=6379, db=0)

def get_user_id():
    # In a real app, this would come from authentication
    return "user:123"

def call_api_endpoint():
    user_id = get_user_id()
    rate_limiter_key = f"rate_limit:{user_id}"
    max_tokens = 10
    refill_rate = 10 # tokens per minute
    
    # The Lua script is the core of the atomic operation
    # It checks if tokens are available and decrements if so,
    # or returns 0 if not.
    lua_script = """
    local key = KEYS[1]
    local max_tokens = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    
    local current_time = redis.call('TIME')[1]
    
    -- Get current tokens and last refill time, initialize if not exists
    local token_data = redis.call('HGETALL', key)
    local current_tokens, last_refill_time
    
    if #token_data == 0 then
        current_tokens = max_tokens
        last_refill_time = current_time
    else
        current_tokens = tonumber(redis.call('HGET', key, 'tokens'))
        last_refill_time = tonumber(redis.call('HGET', key, 'last_refill'))
    end
    
    -- Calculate tokens to add since last refill
    local time_elapsed = current_time - last_refill_time
    local tokens_to_add = math.floor(time_elapsed * (refill_rate / 60))
    
    -- Update current tokens, capping at max_tokens
    current_tokens = math.min(max_tokens, current_tokens + tokens_to_add)
    
    -- Update last refill time
    last_refill_time = current_time
    
    -- Check if a token is available
    if current_tokens >= 1 then
        current_tokens = current_tokens - 1
        redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', last_refill_time)
        -- Set expiry to prevent stale keys, e.g., 2 minutes after last refill
        redis.call('EXPIRE', key, 120) 
        return 1 -- Success, token consumed
    else
        -- Store updated token count and last refill time even if no token was consumed
        redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', last_refill_time)
        redis.call('EXPIRE', key, 120)
        return 0 -- Failure, no token available
    end
    """
    
    script_sha = client.script_load(lua_script) # Load script once for efficiency
    
    # Call the script using EVALSHA for performance
    result = client.evalsha(script_sha, 1, rate_limiter_key, max_tokens, refill_rate)
    
    if result == 1:
        print(f"Request allowed. Tokens remaining: {client.hget(rate_limiter_key, 'tokens')}")
        return True
    else:
        print("Rate limit exceeded.")
        return False

# Simulate making requests
print("--- Making 12 requests in quick succession ---")
for i in range(12):
    if call_api_endpoint():
        pass # Do API work
    else:
        print(f"Request {i+1} blocked.")
    time.sleep(0.1) # Small delay to simulate real-world timing

print("\n--- Waiting for refill ---")
time.sleep(60) # Wait for the minute to pass

print("\n--- Making another 10 requests ---")
for i in range(10):
    if call_api_endpoint():
        pass # Do API work
    else:
        print(f"Request {i+1} blocked.")
    time.sleep(0.1)

This example shows how Valkey, by using Lua scripts, guarantees that checking for and consuming a token is an atomic operation. There’s no race condition where two requests could both see a token available, and then both try to consume it, leading to over-consumption. The entire check-and-decrement logic happens on the server, as a single, indivisible command.

The mental model here is a bucket that holds tokens. Tokens are added to the bucket at a fixed rate. When a request comes in, it tries to take a token from the bucket. If there’s a token, the request is allowed, and a token is removed. If the bucket is empty, the request is denied. The max_tokens parameter ensures the bucket doesn’t overflow, and the refill_rate determines how quickly it fills up.

The key levers you control are max_tokens and refill_rate. max_tokens defines the burst capacity – how many requests can be handled in a short, intense burst. refill_rate defines the sustained throughput – how many requests can be handled over a longer period. Choosing these values is a balance: too low, and you’ll block legitimate users; too high, and you risk overwhelming your downstream services.

What most people don’t realize is that the TIME command in Redis/Valkey returns seconds and microseconds. The Lua script leverages this to calculate fractional tokens added per second, which is more precise than just using integer seconds. This allows for a smoother refill across the minute, rather than a big chunk being added only at the top of the minute.

The next concept to explore is how to manage multiple rate limiters for different resources or users, and how to aggregate their usage.

More Deep Dives in Valkey