PostgreSQL 12 Streaming replication on Ubuntu easy step-by-step guide

PostgreSQL 12 Streaming replication on Ubuntu easy step-by-step guide

PostgreSQL 12 Streaming Replication

In this PostgreSQL tutorial, we will be looking at the step-by-step guide on how to set up PostgreSQL 12 streaming replication on Ubuntu. Like many popular database platforms, PostgreSQL offers its own replication solution that offers data high-availability and is fairly easy to implement.

 

Replication in PostgreSQL

In PostgreSQL 12 streaming replication is asynchronous by default. Being asynchronous, there is a high possibility of data loss if the primary node should go offline. Between the primary and secondary node, it usually takes a couple of milliseconds (depending on your network speed) to send the committed data to the secondary node.  To avoid the data loss that occurs in the asynchronous mode, we can put our streaming replication in synchronous mode. In synchronous mode, data is committed to the primary and one or more secondary node simultaneously thus lessening the risk of data loss.

 

Prerequisite

For this tutorial we will be using two nodes with Ubuntu 18.04 installed as the operating system to implement our streaming replication cluster. To install PostgreSQL 12 on Ubuntu please see our Step by step guide on how To Install PostgreSQL 12 on Ubuntu 16.04/18.04/20.04. Please note that this tutorial will work on other version of Ubuntu and other Linux distribution.

Primary Node:  18.219.169.4

Secondary Node: 18.217.202.39

Configure the master node

On the primary node switch account to the postgres user. Connect to the postgres interactive shell by executing the psqlcommand. Once in the interactive shell enter the command to set the IP addresses from which the primary node will listen to for connections:

psql -c "ALTER SYSTEM SET listen_addresses TO '*';

Set IP addresses to list for connections in PostgreSQL 12

Create replication user

For this step we will be creating a replication user named replicator that will be used to facilitate connections between the primary and secondary node.  To do this we will be using the createuser utility program. to proceed we need to exit the the PostgreSQL  interactive shell by typing \q. we should now be back on our Linux command-line, if so, copy paste and execute the command below and enter a password for the user.

createuser --replication -P -e replicator

Create replicator user in PostgreSQL 12

Note: There are many ways to create the user, this is just my preferred way of doing it. For other ways to create users in PostgreSQL check out my other post: 3 easy ways to create users in Postgresql, that gives a detailed tutorial on ways to create users in PostgreSQL.

 

Edit the pg_hba.conf file

Now that our replication user has been created it is time to configure the user to access our secondary node. To do this we have to edit and set rules in our pg_hba.conf file. In Ubuntu you can find the pg_hba.conf file in the /etc/postgresql/12/maindirectory. For more info on how to configure this file, please see my post: Understanding the pg_hba conf file in PostgreSQL. Open the pg_hba.conf file and add the following line :

host    replication     replicator      <<Secondary node IP>>     md5

In my case I would enter:

host    replication     replicator      18.217.202.39/0    md5

See the gif below on how to achieve this.

Editing the pg_hba.conf file

Restart PostgreSQL 12

For the changes to take effect we need to restart or reload the PostgreSQL 12 services. We can do this by running:

sudo systemctl reload postgresql.service

Or connect to the Postgres interactive shell and enter:

Select pg_reload_conf();

Relaod the postgresql database

Back up standby

In this step we will take a backup of the database on the secondary node before deleting the data. copy and execute the codes below individually.

sudo systemctl stop postgresql.service

sudo systemctl status postgresql.service

cp -R /var/lib/postgresql/12/main /var/lib/postgresql/12/main_bak1

rm -rf /var/lib/postgresql/12/main/*

Backup from primary to standby

Now that we have taken a backup of the secondary node and deleted the existing database. We will now need to backup and restore the primary database on to the secondary node using the pg_basebackup utility tool.

See the command below that should be executed to achieve this:

pg_basebackup -h 18.219.169.4 -D /var/lib/postgresql/12/main -U replicator -P -v -R -X stream -C -S node2

-h  – Represents the node that will be backed up. In our case this will be our primary node.

-D – The data directory where the backup will be transferred to.

-U – Connection User

-P – Turns on progress reporting.

-v – Enables verbose mode

-R – Creates a standby.signal file and append connection settings to postgresql.auto.conf

-X – includes the required write-ahead log files in the backup

-C – enables the creation of a replication slot named by the -S option before starting the backup.

-S – specifies the replication slot name.

restore the primary database on to the secondary node

Lising the database file in postgresql 12

We can now head back over to the primary node and check the database replication slot by running:

SELECT * FROM pg_replication_slots;

If you are following this tutorial to the “T” then you should see a replication slot by the name of node2.

Replication slots in postgresql 12

Now we need to head back over to the secondary node and start up the services on that database by running the command:

sudo systemctl start postgresql

By now there should be a connection between our 2 nodes, to confirm this we check the pg_stat_wal_receiver view by running the following sql command:

SELECT * FROM pg_stat_wal_receiver;

Or from the command-line:

psql -c "\x" -c "SELECT * FROM pg_stat_wal_receiver;"
psql -c "\x" -c "SELECT * FROM pg_stat_replication;"

 

Now that we have completed setting up or Postgresql 12 streaming replication, it is now time to test it. Head over to the primary node and connect to the database. Once connected we will proceed to creating a database by running the command:

CREATE DATABASE FIRSTDB;

Creating a database on the primary node

After executing the command, head over to the secondary node and check if the database was replicated on to the server by connecting to the database and running the command \l.

Replicated database on the secondary node

Enable Synchronous Replication

To minimize data loss we can enable Synchronous Replication. Synchronous replication allows the database to write data to the primary and secondary database at the same time.  When in this mode transactions are confirmed to be successful when the data is transferred to at least one other synchronous standby server.

To set enable synchronous replication, we need to connect to the primary  databse and execute the command:

ALTER SYSTEM SET synchronous_standby_names TO  '*';

For the changes to take effect we need to reload the posgresql.conf file by connecting to the database on the primary node and running:

Select pg_reload_conf();

or from the command-line execute:

psql -c "\x" -c "Select pg_reload_conf();"

Now when you execute the command below, we should see that the database is in sync mode :

psql -c "\x" -c "SELECT * FROM pg_stat_replication;"

Checking if postgresql 12 cluster is in sync mode

Conclusion

Following this tutorial you should have a working PostgreSQL 12 streaming replication on Ubuntu. To add more nodes you can replicate the steps above.