Upgrading Vault to a new minor version can be done without any downtime, but it requires careful planning and execution, especially if you’re using HA.
Let’s see how a simple HA setup works and how we can upgrade it.
Imagine two Vault servers, vault-0 and vault-1, behind a load balancer.
# Basic Vault server config
server {
enabled = true
address = "0.0.0.0"
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}
storage "raft" {
path = "/opt/vault/data"
node_id = "vault-0" # or "vault-1"
}
ha_storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
}
api_addr = "http://vault-0:8200" # or vault-1
}
The load balancer directs traffic to whichever server is currently the leader. Vault uses Raft for its HA consensus, and only the leader can process write operations. The other servers are followers, replicating the state.
Here’s a snapshot of vault status on the leader:
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Raft Proposals 12345
Raft Leader vault-0
Raft Index 67890
Raft Term 1
HA Enabled true
Notice Raft Leader: vault-0. This is crucial. To upgrade without downtime, we need to ensure there’s always at least one healthy, available Vault server to act as the leader.
The strategy is to upgrade one server at a time, ensuring the cluster remains healthy and a leader is always elected.
Step 1: Upgrade vault-0
- Stop
vault-0:sudo systemctl stop vault - Upgrade Vault binary: Replace the existing
vaultbinary with the new version (e.g., 1.15.3 to 1.16.0). - Start
vault-0:sudo systemctl start vault
At this point, vault-0 starts up with the new binary. Since it was previously the leader, it will attempt to re-establish leadership. The other follower(s) will likely elect a new leader among themselves.
Check vault status on vault-1 (which is now likely the leader):
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Raft Proposals 12346
Raft Leader vault-1
Raft Index 67891
Raft Term 2
HA Enabled true
vault-0 will rejoin the cluster as a follower. The load balancer will now direct all traffic to vault-1.
Step 2: Upgrade vault-1
Now, repeat the process for vault-1.
- Stop
vault-1:sudo systemctl stop vault - Upgrade Vault binary: Replace the existing
vaultbinary with the new version. - Start
vault-1:sudo systemctl start vault
vault-1 restarts with the new binary. It will likely re-establish leadership, and vault-0 will rejoin as a follower.
Important Considerations:
- Load Balancer Health Checks: Ensure your load balancer’s health check is configured to only consider healthy, unsealed Vault nodes. A basic TCP check on port 8200 is insufficient. Vault exposes a
/sys/healthendpoint that is ideal for this. - CLI vs. Binary Upgrade: This procedure assumes you’re directly replacing the Vault binary. If you’re using a package manager, the commands will differ. Always check the official upgrade documentation for the specific version you’re moving to.
- Raft Storage: This example uses
storage "raft". If you’re using an external storage backend like Consul or etcd, the upgrade procedure is different, focusing on upgrading the Vault servers themselves while the backend remains stable. - Rebalancing/Resharding: For major version upgrades or significant changes to Raft configuration, you might need to consider Raft rebalancing or resharding. This is a more complex procedure and usually requires a brief maintenance window or very careful orchestration.
- Testing: Always test your upgrade procedure in a staging environment that mirrors your production setup as closely as possible.
The most critical part is ensuring that during the brief moment a server is down, the remaining nodes can elect a new leader and continue serving requests. The load balancer’s ability to quickly detect the leader and direct traffic is paramount.
If your Raft cluster has only two nodes, you will experience a brief leader election period where writes might be unavailable. For true zero-downtime, you need at least three Vault nodes in your HA cluster.
The next challenge is often dealing with the upgraded client libraries or ensuring your applications are compatible with the new Vault API features.