TimescaleDB’s streaming replication doesn’t just copy data; it fundamentally changes how you think about your database’s availability and scalability.
Let’s see it in action. Imagine you have a primary TimescaleDB instance running on db1.example.com and you want to set up a replica for high availability and to offload read traffic.
First, on your primary (db1.example.com), you need to configure postgresql.conf and pg_hba.conf.
In postgresql.conf:
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024MB # Or a suitably large value to avoid WAL gaps
These settings are crucial. wal_level = replica tells PostgreSQL to write enough information to the Write-Ahead Log (WAL) to allow for replication. max_wal_senders determines how many standby servers can connect and stream WAL data. wal_keep_size ensures that the primary retains enough WAL files so that a replica can catch up if it falls behind, preventing replication errors.
In pg_hba.conf (on the primary):
host replication repl_user 192.168.1.0/24 md5
This line grants the repl_user (which you’ll create) permission to connect for replication purposes from the subnet 192.168.1.0/24. The md5 indicates password authentication.
Next, create the replication user on the primary:
CREATE ROLE repl_user WITH REPLICATION LOGIN PASSWORD 'your_secure_password';
Now, on the replica server (db2.example.com), you’ll initialize it from a base backup of the primary. The pg_basebackup tool is your friend here.
pg_basebackup -h db1.example.com -U repl_user -D /var/lib/postgresql/14/main/ -P -v -R
This command does several things:
-h db1.example.com: Specifies the primary server.-U repl_user: The replication user.-D /var/lib/postgresql/14/main/: The data directory for the replica. Ensure this directory is empty or does not exist before running.-P: Shows progress.-v: Verbose output.-R: This is the magic flag. It creates astandby.signalfile and apostgresql.auto.conffile in the replica’s data directory, pre-configured for streaming replication.
The -R flag is particularly neat because it automatically creates the necessary primary_conninfo and primary_slot_name (if you were using physical replication slots) in postgresql.auto.conf on the replica, pointing back to the primary. For logical replication, you’d typically configure primary_conninfo manually in postgresql.conf or postgresql.auto.conf.
With TimescaleDB, there’s a subtle but powerful difference compared to standard PostgreSQL. When you set up a replica using pg_basebackup with the -R flag, it creates a physical standby. This means the replica is an exact bit-for-bit copy of the primary. All your TimescaleDB hypertables, regular tables, and time-series data are replicated. You can connect to this replica and run SELECT queries, effectively offloading read traffic.
To promote a standby to a primary (failover), you would typically stop the PostgreSQL process on the standby, remove the standby.signal file, and then start PostgreSQL again. For a more robust promotion, especially in an HA scenario, you’d use pg_ctl promote or a tool like Patroni.
The mental model is that the replica is a follower that’s constantly catching up to the leader. It receives WAL records from the primary and replays them. For read-only queries, you can direct traffic to the replica. If the primary fails, you can promote the replica to become the new primary, minimizing downtime.
What most people don’t realize is that when you use pg_basebackup -R, it configures the replica to automatically create its own postgresql.auto.conf with primary_conninfo. This file is intended for automatic configuration and is generally preferred over editing postgresql.conf directly for replication settings, as it’s managed by PostgreSQL itself and survives configuration reloads.
The next challenge you’ll likely face is managing replication lag and ensuring data consistency in the face of network partitions or primary unavailability.