Valkey, when run in production via Docker, often feels like a black box until you realize its performance is fundamentally limited by how you don’t tune its networking and memory.

Let’s set up a Valkey instance in Docker for production and then tune it.

Production Valkey Docker Setup

First, we need a docker-compose.yml file. We’ll mount a configuration file and persistent data.

version: '3.8'
services:
  valkey:
    image: valkeydb/valkey:latest
    container_name: valkey_prod
    ports:
      - "6379:6379"
    volumes:
      - ./valkey.conf:/usr/local/etc/valkey/valkey.conf
      - ./valkey_data:/data
    restart: always
    environment:
      - TZ=UTC

Next, create a valkey.conf file. This is where the real tuning happens.

# valkey.conf

port 6379
bind 0.0.0.0

# Persistence
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec

# Memory Management
maxmemory 2gb
maxmemory-policy allkeys-lru

# Networking
tcp-backlog 511
tcp-keepalive 300

# General
daemonize no
pidfile /var/run/valkey.pid
logfile /var/log/valkey/valkey.log
databases 16

# RDB
save 900 1
save 300 10
save 60 10000
dbfilename dump.rdb
dir /data

Finally, create the directories for data and logs:

mkdir valkey_data
mkdir valkey_log

Now, start Valkey:

docker-compose up -d

Tuning Valkey in Production Docker

The defaults are rarely suitable for production. Here’s what to tune:

1. Memory Management (maxmemory and maxmemory-policy)

  • Diagnosis: Check current memory usage with docker exec valkey_prod valkey-cli info memory. If you see high used_memory or evicted_keys, you’re hitting limits.
  • Fix: In valkey.conf, set maxmemory 4gb (adjust to 70-80% of your container’s allocated RAM) and maxmemory-policy allkeys-random.
  • Why it works: maxmemory prevents Valkey from consuming all available RAM, which could lead to OOM killer intervention. allkeys-random is a common policy for cache-like workloads, evicting random keys when memory is full. allkeys-lru (Least Recently Used) is also a strong contender if you want to keep the most accessed keys.

2. Persistence (appendonly, appendfsync, save)

  • Diagnosis: Monitor disk I/O. If appendonly is off, you lose data on restart. If appendfsync is too aggressive (e.g., always), it can thrash your disk.
  • Fix: Ensure appendonly yes. Set appendfsync everysec (a good balance between durability and performance) and tune save directives to less frequent intervals like save 900 1 (1 change in 900 seconds).
  • Why it works: appendonly yes logs every write operation to a file, allowing recovery. everysec syncs the append-only file to disk once per second, sacrificing minimal durability for significant I/O reduction. save directives create snapshots of the dataset at specific intervals, reducing the need for full AOF rewrites.

3. Networking (tcp-backlog, tcp-keepalive)

  • Diagnosis: High tcp_connections_received with slow tcp_clients_connected might indicate backlog issues. Network-related errors in logs or application timeouts point here.
  • Fix: Increase tcp-backlog 1024 (or higher, depending on expected connections) and set tcp-keepalive 600.
  • Why it works: tcp-backlog is the queue size for incoming TCP connections waiting to be accepted. A higher value helps handle bursts of connections. tcp-keepalive closes idle connections gracefully, preventing stale connections from hogging resources.

4. Logging (logfile, loglevel)

  • Diagnosis: If you’re not seeing errors or want more detail, check logfile. If logs are excessively verbose and filling up disk, adjust loglevel.
  • Fix: Ensure logfile /var/log/valkey/valkey.log is set and create the directory. For production, loglevel notice is usually sufficient. If debugging, temporarily switch to loglevel verbose.
  • Why it works: Directing logs to a file within the container (mounted to valkey_log in our compose) allows for persistent logging and easier inspection. notice level provides important operational information without excessive detail.

5. RDB vs. AOF (dbfilename, dir, appendfilename)

  • Diagnosis: Understand your recovery point objective (RPO). If you need near-instantaneous recovery, AOF everysec is good. If occasional data loss (minutes) is acceptable and you want smaller files, RDB might suffice.
  • Fix: Ensure both appendonly yes and save directives are configured. The dir /data directive is crucial for ensuring persistence files land in the mounted volume.
  • Why it works: Valkey offers two persistence mechanisms: RDB (point-in-time snapshots) and AOF (write-ahead logging). Running both provides a good balance of performance and durability. RDB is generally faster to load, while AOF offers better durability.

6. Container Resource Limits

  • Diagnosis: Use docker stats valkey_prod to monitor CPU and Memory usage. If Valkey is consistently hitting container limits, performance will suffer.
  • Fix: Adjust the resources section in your docker-compose.yml for the Valkey service:
    resources:
      limits:
        cpus: '1.0'
        memory: 6G
      reservations:
        cpus: '0.5'
        memory: 4G
    
  • Why it works: Explicitly setting CPU and memory limits prevents the Docker daemon from arbitrarily killing your Valkey container or allowing it to starve other processes on the host. Reservations ensure a minimum amount of resources are available.

The next common issue you’ll face after ensuring your Valkey instance is stable and performant is dealing with network latency between your application containers and the Valkey container, especially in multi-host Docker Swarm or Kubernetes environments.

Want structured learning?

Take the full Valkey course →