Valkey, when run in production via Docker, often feels like a black box until you realize its performance is fundamentally limited by how you don’t tune its networking and memory.
Let’s set up a Valkey instance in Docker for production and then tune it.
Production Valkey Docker Setup
First, we need a docker-compose.yml file. We’ll mount a configuration file and persistent data.
version: '3.8'
services:
valkey:
image: valkeydb/valkey:latest
container_name: valkey_prod
ports:
- "6379:6379"
volumes:
- ./valkey.conf:/usr/local/etc/valkey/valkey.conf
- ./valkey_data:/data
restart: always
environment:
- TZ=UTC
Next, create a valkey.conf file. This is where the real tuning happens.
# valkey.conf
port 6379
bind 0.0.0.0
# Persistence
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
# Memory Management
maxmemory 2gb
maxmemory-policy allkeys-lru
# Networking
tcp-backlog 511
tcp-keepalive 300
# General
daemonize no
pidfile /var/run/valkey.pid
logfile /var/log/valkey/valkey.log
databases 16
# RDB
save 900 1
save 300 10
save 60 10000
dbfilename dump.rdb
dir /data
Finally, create the directories for data and logs:
mkdir valkey_data
mkdir valkey_log
Now, start Valkey:
docker-compose up -d
Tuning Valkey in Production Docker
The defaults are rarely suitable for production. Here’s what to tune:
1. Memory Management (maxmemory and maxmemory-policy)
- Diagnosis: Check current memory usage with
docker exec valkey_prod valkey-cli info memory. If you see highused_memoryorevicted_keys, you’re hitting limits. - Fix: In
valkey.conf, setmaxmemory 4gb(adjust to 70-80% of your container’s allocated RAM) andmaxmemory-policy allkeys-random. - Why it works:
maxmemoryprevents Valkey from consuming all available RAM, which could lead to OOM killer intervention.allkeys-randomis a common policy for cache-like workloads, evicting random keys when memory is full.allkeys-lru(Least Recently Used) is also a strong contender if you want to keep the most accessed keys.
2. Persistence (appendonly, appendfsync, save)
- Diagnosis: Monitor disk I/O. If
appendonlyis off, you lose data on restart. Ifappendfsyncis too aggressive (e.g.,always), it can thrash your disk. - Fix: Ensure
appendonly yes. Setappendfsync everysec(a good balance between durability and performance) and tunesavedirectives to less frequent intervals likesave 900 1(1 change in 900 seconds). - Why it works:
appendonly yeslogs every write operation to a file, allowing recovery.everysecsyncs the append-only file to disk once per second, sacrificing minimal durability for significant I/O reduction.savedirectives create snapshots of the dataset at specific intervals, reducing the need for full AOF rewrites.
3. Networking (tcp-backlog, tcp-keepalive)
- Diagnosis: High
tcp_connections_receivedwith slowtcp_clients_connectedmight indicate backlog issues. Network-related errors in logs or application timeouts point here. - Fix: Increase
tcp-backlog 1024(or higher, depending on expected connections) and settcp-keepalive 600. - Why it works:
tcp-backlogis the queue size for incoming TCP connections waiting to be accepted. A higher value helps handle bursts of connections.tcp-keepalivecloses idle connections gracefully, preventing stale connections from hogging resources.
4. Logging (logfile, loglevel)
- Diagnosis: If you’re not seeing errors or want more detail, check
logfile. If logs are excessively verbose and filling up disk, adjustloglevel. - Fix: Ensure
logfile /var/log/valkey/valkey.logis set and create the directory. For production,loglevel noticeis usually sufficient. If debugging, temporarily switch tologlevel verbose. - Why it works: Directing logs to a file within the container (mounted to
valkey_login our compose) allows for persistent logging and easier inspection.noticelevel provides important operational information without excessive detail.
5. RDB vs. AOF (dbfilename, dir, appendfilename)
- Diagnosis: Understand your recovery point objective (RPO). If you need near-instantaneous recovery, AOF
everysecis good. If occasional data loss (minutes) is acceptable and you want smaller files, RDB might suffice. - Fix: Ensure both
appendonly yesandsavedirectives are configured. Thedir /datadirective is crucial for ensuring persistence files land in the mounted volume. - Why it works: Valkey offers two persistence mechanisms: RDB (point-in-time snapshots) and AOF (write-ahead logging). Running both provides a good balance of performance and durability. RDB is generally faster to load, while AOF offers better durability.
6. Container Resource Limits
- Diagnosis: Use
docker stats valkey_prodto monitor CPU and Memory usage. If Valkey is consistently hitting container limits, performance will suffer. - Fix: Adjust the
resourcessection in yourdocker-compose.ymlfor the Valkey service:resources: limits: cpus: '1.0' memory: 6G reservations: cpus: '0.5' memory: 4G - Why it works: Explicitly setting CPU and memory limits prevents the Docker daemon from arbitrarily killing your Valkey container or allowing it to starve other processes on the host. Reservations ensure a minimum amount of resources are available.
The next common issue you’ll face after ensuring your Valkey instance is stable and performant is dealing with network latency between your application containers and the Valkey container, especially in multi-host Docker Swarm or Kubernetes environments.