Vault’s Raft storage backend is how it keeps its data safe and available when you set up multiple Vault servers in a cluster for High Availability (HA).

resource "vault_raft_storage" "ha" {
  path    = "/vault/data"
  node_id = "vault-01"
  retry_join {
    leader_address = "https://vault-01.example.com:8200"
    leader_ca_cert = file("certs/ca.pem")
  }
  retry_join {
    leader_address = "https://vault-02.example.com:8200"
    leader_ca_cert = file("certs/ca.pem")
  }
  retry_join {
    leader_address = "https://vault-03.example.com:8200"
    leader_ca_cert = file("certs/ca.pem")
  }
}

This Terraform configuration sets up Vault to use Raft for storing its state across multiple nodes. Each vault_raft_storage block defines a single Vault server’s Raft configuration. The path is where Vault will store its Raft data on disk, and node_id is a unique identifier for this specific Vault server within the Raft cluster. The retry_join blocks are crucial; they tell each Vault server how to find and join an existing Raft cluster. By specifying leader_address and leader_ca_cert for multiple potential leaders, Vault can attempt to connect to any of them to discover the cluster membership. Once a server successfully joins, it participates in the Raft consensus protocol, replicating data and electing a leader.

Here’s what a Vault server’s configuration file (config.hcl) looks like when using Raft storage:

storage "raft" {
  path = "/opt/vault/data"
  node_id = "vault-01"

  retry_join {
    leader_address = "https://vault-01.example.com:8200"
    leader_ca_cert = "/opt/vault/certs/ca.pem"
  }
  retry_join {
    leader_address = "https://vault-02.example.com:8200"
    leader_ca_cert = "/opt/vault/certs/ca.pem"
  }
  retry_join {
    leader_address = "https://vault-03.example.com:8200"
    leader_ca_cert = "/opt/vault/certs/ca.pem"
  }
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/opt/vault/certs/vault-01.pem"
  tls_key_file  = "/opt/vault/certs/vault-01-key.pem"
}

api_addr = "https://vault-01.example.com:8200"
cluster_addr = "https://vault-01.example.com:8201"

The storage block specifies raft as the backend. path is the directory on disk where Raft will store its log and state. node_id must be unique for each server in the cluster. The retry_join stanza is how new nodes discover and join an existing Raft cluster. You provide a list of addresses of known nodes in the cluster, along with their CA certificate for TLS verification. Vault will attempt to connect to these leaders to learn about the cluster. The listener block configures Vault to listen for API requests, and api_addr and cluster_addr define how other Vault servers and clients should reach this specific node.

When you have multiple Vault servers in a Raft cluster, one server is elected as the Raft leader. All write operations to Vault must go through this leader. The leader then replicates these operations to the other follower nodes in the cluster using the Raft consensus protocol. This ensures that even if the leader fails, another node can be elected as the new leader, and the cluster can continue to operate without data loss. The retry_join mechanism is key for bootstrapping this process; once a cluster is formed, any node can join by contacting any of the existing members.

The most surprising thing about Vault’s Raft implementation is that it doesn’t actually store the entire Vault state in Raft. Instead, Raft stores the transaction log of all changes made to Vault’s state. Vault then uses this log to reconstruct its current state. This is a fundamental aspect of how Raft works: it’s a distributed log replication protocol.

Imagine you have three Vault servers: vault-01, vault-02, and vault-03. You start vault-01 first, and it becomes the initial leader. Then you start vault-02. Its retry_join config points to vault-01. vault-02 contacts vault-01, and they establish a Raft connection. vault-02 becomes a follower. Now, you start vault-03. Its retry_join config might point to either vault-01 or vault-02. When vault-03 connects, it will learn about the existing cluster and join as another follower. If vault-01 (the leader) goes down, the remaining nodes (vault-02 and vault-03) will hold an election, and one of them will become the new leader.

When you perform an operation, like writing a secret:

  1. A client sends a request to the current Vault leader (e.g., vault-01).
  2. vault-01 validates the request and appends it to its Raft log.
  3. vault-01 sends this log entry to its followers (vault-02, vault-03).
  4. Once a majority of nodes (in this case, at least two out of three) have acknowledged receiving the log entry, vault-01 commits the entry.
  5. vault-01 then applies the change to its local state and responds to the client with success.
  6. The followers also receive the acknowledged log entry, apply it to their state, and become consistent with the leader.

This ensures that writes are durable and that the cluster can maintain a consistent view of its data. The cluster_addr in the Vault configuration is used by the Vault servers themselves to communicate with each other for Raft replication and leader election. It’s essential that this address is reachable by all other Vault nodes in the cluster.

The retry_join mechanism is not just for initial bootstrapping. If a Vault server loses its connection to the cluster, it will use the retry_join configuration to attempt to re-establish that connection. This makes the setup resilient to transient network issues. You can configure retry_join to point to any combination of known nodes in the cluster. Vault will try them in order, and once it successfully connects to one, it will learn the full membership of the Raft cluster.

The capacity of the Raft log is not infinite. As the log grows, Vault needs to periodically compact it to save disk space. This is handled automatically by Vault’s Raft implementation. However, if the disk fills up, Raft operations will fail, leading to instability. Monitoring disk space on all nodes participating in the Raft storage is crucial.

A common misconception is that api_addr is used for inter-node communication. While clients use api_addr to talk to Vault, the Vault servers themselves use cluster_addr to communicate with each other for Raft operations. If cluster_addr is not set or is incorrect, you’ll see errors related to nodes not being able to join the cluster or replicate state.

The next hurdle you’ll likely encounter is understanding how to manage TLS certificates for your Vault cluster to ensure secure communication between nodes and from clients.

Want structured learning?

Take the full Vault course →