PostgreSQL 12 Streaming Replication
In this PostgreSQL tutorial, we will be looking at the step-by-step guide on how to set up PostgreSQL 12 streaming replication on Ubuntu. Like many popular database platforms, PostgreSQL offers its own replication solution that offers data high-availability and is fairly easy to implement.
Replication in PostgreSQL
In PostgreSQL 12 streaming replication is asynchronous by default. Being asynchronous, there is a high possibility of data loss if the primary node should go offline. Between the primary and secondary node, it usually takes a couple of milliseconds (depending on your network speed) to send the committed data to the secondary node. To avoid the data loss that occurs in the asynchronous mode, we can put our streaming replication in synchronous mode. In synchronous mode, data is committed to the primary and one or more secondary node simultaneously thus lessening the risk of data loss.
For this tutorial we will be using two nodes with Ubuntu 18.04 installed as the operating system to implement our streaming replication cluster. To install PostgreSQL 12 on Ubuntu please see our Step by step guide on how To Install PostgreSQL 12 on Ubuntu 16.04/18.04/20.04. Please note that this tutorial will work on other version of Ubuntu and other Linux distribution.
Primary Node: 18.104.22.168
Secondary Node: 22.214.171.124
Configure the master node
On the primary node switch account to the postgres user. Connect to the postgres interactive shell by executing the
psqlcommand. Once in the interactive shell enter the command to set the IP addresses from which the primary node will listen to for connections:
psql -c "ALTER SYSTEM SET listen_addresses TO '*';
Create replication user
For this step we will be creating a replication user named replicator that will be used to facilitate connections between the primary and secondary node. To do this we will be using the createuser utility program. to proceed we need to exit the the PostgreSQL interactive shell by typing \q. we should now be back on our Linux command-line, if so, copy paste and execute the command below and enter a password for the user.
createuser --replication -P -e replicator
Note: There are many ways to create the user, this is just my preferred way of doing it. For other ways to create users in PostgreSQL check out my other post: 3 easy ways to create users in Postgresql, that gives a detailed tutorial on ways to create users in PostgreSQL.
Edit the pg_hba.conf file
Now that our replication user has been created it is time to configure the user to access our secondary node. To do this we have to edit and set rules in our pg_hba.conf file. In Ubuntu you can find the pg_hba.conf file in the
/etc/postgresql/12/maindirectory. For more info on how to configure this file, please see my post: Understanding the pg_hba conf file in PostgreSQL. Open the pg_hba.conf file and add the following line :
host replication replicator <<Secondary node IP>> md5
In my case I would enter:
host replication replicator 126.96.36.199/0 md5
See the gif below on how to achieve this.
Restart PostgreSQL 12
For the changes to take effect we need to restart or reload the PostgreSQL 12 services. We can do this by running:
sudo systemctl reload postgresql.service
Or connect to the Postgres interactive shell and enter:
Back up standby
In this step we will take a backup of the database on the secondary node before deleting the data. copy and execute the codes below individually.
sudo systemctl stop postgresql.service sudo systemctl status postgresql.service cp -R /var/lib/postgresql/12/main /var/lib/postgresql/12/main_bak1 rm -rf /var/lib/postgresql/12/main/*
Backup from primary to standby
Now that we have taken a backup of the secondary node and deleted the existing database. We will now need to backup and restore the primary database on to the secondary node using the pg_basebackup utility tool.
See the command below that should be executed to achieve this:
pg_basebackup -h 188.8.131.52 -D /var/lib/postgresql/12/main -U replicator -P -v -R -X stream -C -S node2
-h – Represents the node that will be backed up. In our case this will be our primary node.
-D – The data directory where the backup will be transferred to.
-U – Connection User
-P – Turns on progress reporting.
-v – Enables verbose mode
-R – Creates a standby.signal file and append connection settings to postgresql.auto.conf
-X – includes the required write-ahead log files in the backup
-C – enables the creation of a replication slot named by the -S option before starting the backup.
-S – specifies the replication slot name.
We can now head back over to the primary node and check the database replication slot by running:
SELECT * FROM pg_replication_slots;
If you are following this tutorial to the “T” then you should see a replication slot by the name of node2.
Now we need to head back over to the secondary node and start up the services on that database by running the command:
sudo systemctl start postgresql
By now there should be a connection between our 2 nodes, to confirm this we check the pg_stat_wal_receiver view by running the following sql command:
SELECT * FROM pg_stat_wal_receiver;
Or from the command-line:
psql -c "\x" -c "SELECT * FROM pg_stat_wal_receiver;" psql -c "\x" -c "SELECT * FROM pg_stat_replication;"
Now that we have completed setting up or Postgresql 12 streaming replication, it is now time to test it. Head over to the primary node and connect to the database. Once connected we will proceed to creating a database by running the command:
CREATE DATABASE FIRSTDB;
After executing the command, head over to the secondary node and check if the database was replicated on to the server by connecting to the database and running the command
Enable Synchronous Replication
To minimize data loss we can enable Synchronous Replication. Synchronous replication allows the database to write data to the primary and secondary database at the same time. When in this mode transactions are confirmed to be successful when the data is transferred to at least one other synchronous standby server.
To set enable synchronous replication, we need to connect to the primary databse and execute the command:
ALTER SYSTEM SET synchronous_standby_names TO '*';
For the changes to take effect we need to reload the posgresql.conf file by connecting to the database on the primary node and running:
or from the command-line execute:
psql -c "\x" -c "Select pg_reload_conf();"
Now when you execute the command below, we should see that the database is in sync mode :
psql -c "\x" -c "SELECT * FROM pg_stat_replication;"
Following this tutorial you should have a working PostgreSQL 12 streaming replication on Ubuntu. To add more nodes you can replicate the steps above.