Upgrading Vault to a new minor version can be done without any downtime, but it requires careful planning and execution, especially if you’re using HA.

Let’s see how a simple HA setup works and how we can upgrade it.

Imagine two Vault servers, vault-0 and vault-1, behind a load balancer.

# Basic Vault server config
server {
  enabled = true
  address = "0.0.0.0"
  listener "tcp" {
    address     = "0.0.0.0:8200"
    tls_disable = 1
  }
  storage "raft" {
    path    = "/opt/vault/data"
    node_id = "vault-0" # or "vault-1"
  }
  ha_storage "consul" {
    address = "127.0.0.1:8500"
    path    = "vault/"
  }
  api_addr = "http://vault-0:8200" # or vault-1
}

The load balancer directs traffic to whichever server is currently the leader. Vault uses Raft for its HA consensus, and only the leader can process write operations. The other servers are followers, replicating the state.

Here’s a snapshot of vault status on the leader:

Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Raft Proposals    12345
Raft Leader             vault-0
Raft Index              67890
Raft Term               1
HA Enabled              true

Notice Raft Leader: vault-0. This is crucial. To upgrade without downtime, we need to ensure there’s always at least one healthy, available Vault server to act as the leader.

The strategy is to upgrade one server at a time, ensuring the cluster remains healthy and a leader is always elected.

Step 1: Upgrade vault-0

  1. Stop vault-0: sudo systemctl stop vault
  2. Upgrade Vault binary: Replace the existing vault binary with the new version (e.g., 1.15.3 to 1.16.0).
  3. Start vault-0: sudo systemctl start vault

At this point, vault-0 starts up with the new binary. Since it was previously the leader, it will attempt to re-establish leadership. The other follower(s) will likely elect a new leader among themselves.

Check vault status on vault-1 (which is now likely the leader):

Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Raft Proposals    12346
Raft Leader             vault-1
Raft Index              67891
Raft Term               2
HA Enabled              true

vault-0 will rejoin the cluster as a follower. The load balancer will now direct all traffic to vault-1.

Step 2: Upgrade vault-1

Now, repeat the process for vault-1.

  1. Stop vault-1: sudo systemctl stop vault
  2. Upgrade Vault binary: Replace the existing vault binary with the new version.
  3. Start vault-1: sudo systemctl start vault

vault-1 restarts with the new binary. It will likely re-establish leadership, and vault-0 will rejoin as a follower.

Important Considerations:

  • Load Balancer Health Checks: Ensure your load balancer’s health check is configured to only consider healthy, unsealed Vault nodes. A basic TCP check on port 8200 is insufficient. Vault exposes a /sys/health endpoint that is ideal for this.
  • CLI vs. Binary Upgrade: This procedure assumes you’re directly replacing the Vault binary. If you’re using a package manager, the commands will differ. Always check the official upgrade documentation for the specific version you’re moving to.
  • Raft Storage: This example uses storage "raft". If you’re using an external storage backend like Consul or etcd, the upgrade procedure is different, focusing on upgrading the Vault servers themselves while the backend remains stable.
  • Rebalancing/Resharding: For major version upgrades or significant changes to Raft configuration, you might need to consider Raft rebalancing or resharding. This is a more complex procedure and usually requires a brief maintenance window or very careful orchestration.
  • Testing: Always test your upgrade procedure in a staging environment that mirrors your production setup as closely as possible.

The most critical part is ensuring that during the brief moment a server is down, the remaining nodes can elect a new leader and continue serving requests. The load balancer’s ability to quickly detect the leader and direct traffic is paramount.

If your Raft cluster has only two nodes, you will experience a brief leader election period where writes might be unavailable. For true zero-downtime, you need at least three Vault nodes in your HA cluster.

The next challenge is often dealing with the upgraded client libraries or ensuring your applications are compatible with the new Vault API features.

Want structured learning?

Take the full Vault course →