TimescaleDB high availability with Patroni is less about preventing downtime and more about managing downtime gracefully and automatically.
Let’s see what that looks like in practice. Imagine a simple TimescaleDB cluster with two nodes, pg_node_1 and pg_node_2, managed by Patroni.
Configuration Snippet (Patroni patroni.yml):
scope: my_timescaledb_cluster
namespace: /service/
restapi:
listen: 0.0.0.0:8008
etcd:
host: etcd1:2379
postgresql:
listen: 0.0.0.0:5432
data_dir: /var/lib/postgresql/14/main
pg_hba:
- host replication replicator 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
parameters:
max_connections: 100
shared_buffers: 256MB
wal_level: replica
hot_standby: "on"
archive_mode: "on"
archive_command: 'cp %p /var/lib/postgresql/wal-archive/%f'
recovery_target_timeline: 'latest'
replication:
replication_slots:
- name: patroni_slot_pg_node_1
database: postgres
ssl: true
Here, pg_node_1 is currently the primary. Any application connecting to pg_node_1 on port 5432 will be writing data. pg_node_2 is a replica, streaming WAL records from pg_node_1. Patroni runs on both nodes, constantly monitoring the PostgreSQL process and the distributed configuration store (in this case, etcd).
Patroni uses etcd to store cluster state: who is the current primary, what is the leader’s API endpoint, and so on. When you query http://pg_node_1:8008/primary (Patroni’s REST API), you get back JSON indicating pg_node_1 is primary. If you query http://pg_node_2:8008/primary, you get a redirect to pg_node_1’s API endpoint.
The magic happens when pg_node_1 goes down. Patroni on pg_node_2 detects that its primary is no longer reachable. It then starts a leader election process within etcd. Whoever "wins" the election (acquires a lock in etcd) becomes the new leader and initiates a failover. This involves promoting pg_node_2 to become the new primary. Once pg_node_2 is ready, it updates etcd with its new status. Applications querying the /primary endpoint will now be redirected to pg_node_2.
This entire process, from primary failure detection to a replica being promoted and available for connections, typically takes tens of seconds to a couple of minutes, depending on your network, etcd performance, and PostgreSQL configuration.
The core problem Patroni solves is the coordination of failover. Without it, you’d manually have to:
- Detect the primary failure.
- Decide which replica to promote.
- Manually promote the replica.
- Update your application connection strings or DNS.
- Potentially reconfigure other replicas.
Patroni automates all of this by using a distributed consensus system (like etcd, ZooKeeper, or Consul) as a central source of truth for cluster leadership and state.
The critical component for TimescaleDB specifically, beyond standard PostgreSQL HA, is ensuring that TimescaleDB’s internal metadata and distributed table structures remain consistent across replicas. Because TimescaleDB partitions data into chunks and manages them internally, a clean failover is paramount. Patroni, by managing PostgreSQL’s built-in replication and failover mechanisms, ensures that TimescaleDB’s underlying data is replicated correctly. The archive_command and recovery_target_timeline: 'latest' in the patroni.yml are crucial for ensuring that even if a replica falls behind significantly, it can catch up and maintain consistency.
The most surprising thing about setting up Patroni HA is how little you actually configure TimescaleDB itself for HA. Patroni is a PostgreSQL HA solution first and foremost. Its success with TimescaleDB relies on PostgreSQL’s robust streaming replication and Point-in-Time Recovery (PITR) capabilities, which TimescaleDB leverages. You’re essentially making PostgreSQL highly available, and TimescaleDB benefits from that.
When you’re setting up replication slots (replication_slots in patroni.yml), you need to ensure the database specified is one that exists and is accessible for creating the slot. Often, this is the default postgres database. If you were to have a separate, critical database for your TimescaleDB hypertables, you might specify that instead, like database: my_timescale_db. The name for the slot should be unique per replica to avoid conflicts.
The real power comes from how Patroni handles network partitions and stale replicas. When a node comes back online after being disconnected, Patroni will attempt to re-sync it. If the re-sync is too far behind, or if the replication slot has been dropped due to prolonged downtime, Patroni might prevent it from rejoining as a replica to avoid data divergence. You’d then need to manually re-initialize that node.
The next hurdle you’ll likely encounter is managing read-only workloads, specifically how to direct them to replicas that might not be immediately available after a failover.