PostgreSQL backup tutorial & database backup types explained

PostgreSQL backup tutorial & database backup types explained

PostgreSQL backup tutorial & database backup types explained

As in any production environment, backing up said environment is equally an important and critical part of maintaining that environment. This post will cover the various methods and techniques used to backup a PostgreSQL database, In a different article we will discuss the various methods used to restore PostgreSQL databases.

There are three different types of approach to backing up PostgreSQL databases, namely:
SQL Dump
Continuous Archiving
File system level backup

PostgreSQL SQL Dump

SQL dump is the generation of a text file using the pg_dump utility that containing sql commands. When executed on the sever it recreates the database in the same state as it was at the time the dump was taken.
To use the pg_dump utility please see the command below:
Pg_dump name_of_db > outputfile
The pg_dump utility can be used to take the backup of a local or from a remote host that has access to the database. To operate properly, the pg_dump utility needs a minimum of read access to all tables that are to be backed up, so it is recommended that this utility is executed as a super user.

By default, the pg_dump utility will backup the local host database or whatever is stored in the PGHOST and PGPORT environment variables. To differ from the default the -h hostname and -p port options are used. Similarly the -U username option is used to specify which user is used to run the utility. If the -U option is not specified, then the pg_dump will connect with the database username that is equal to the current operating system user name or the value that is set in the PGUSER variable.

Pg_dump example
pg_dump -U username -h hostname -W -F -d DBname > outputfile.sql

Below is a list of options that are used with the pg_dump utility.

pg_dump --help 
Pg_dump options
Usage:
pg_dump [OPTION]... [DBNAME]

General options:
-f, --file=FILENAME output file or directory name
-F, --format=c|d|t|p output file format (custom, directory, tar,
plain text (default))
-j, --jobs=NUM use this many parallel jobs to dump
-v, --verbose verbose mode
-V, --version output version information, then exit
-Z, --compress=0-9 compression level for compressed formats
--lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock
--no-sync do not wait for changes to be written safely to disk
-?, --help show this help, then exit

Options controlling the output content:
-a, --data-only dump only the data, not the schema
-b, --blobs include large objects in dump
-B, --no-blobs exclude large objects in dump
-c, --clean clean (drop) database objects before recreating
-C, --create include commands to create database in dump
-E, --encoding=ENCODING dump the data in encoding ENCODING
-n, --schema=SCHEMA dump the named schema(s) only
-N, --exclude-schema=SCHEMA do NOT dump the named schema(s)
-o, --oids include OIDs in dump
-O, --no-owner skip restoration of object ownership in
plain-text format
-s, --schema-only dump only the schema, no data
-S, --superuser=NAME superuser user name to use in plain-text format
-t, --table=TABLE dump the named table(s) only
-T, --exclude-table=TABLE do NOT dump the named table(s)
-x, --no-privileges do not dump privileges (grant/revoke)
--binary-upgrade for use by upgrade utilities only
--column-inserts dump data as INSERT commands with column names
--disable-dollar-quoting disable dollar quoting, use SQL standard quoting
--disable-triggers disable triggers during data-only restore
--enable-row-security enable row security (dump only content user has
access to)
--exclude-table-data=TABLE do NOT dump data for the named table(s)
--if-exists use IF EXISTS when dropping objects
--inserts dump data as INSERT commands, rather than COPY
--no-publications do not dump publications
--no-security-labels do not dump security label assignments
--no-subscriptions do not dump subscriptions
--no-synchronized-snapshots do not use synchronized snapshots in parallel jobs
--no-tablespaces do not dump tablespace assignments
--no-unlogged-table-data do not dump unlogged table data
--quote-all-identifiers quote all identifiers, even if not key words
--section=SECTION dump named section (pre-data, data, or post-data)
--serializable-deferrable wait until the dump can run without anomalies
--snapshot=SNAPSHOT use given snapshot for the dump
--strict-names require table and/or schema include patterns to
match at least one entity each
--use-set-session-authorization
use SET SESSION AUTHORIZATION commands instead of
ALTER OWNER commands to set ownership

Connection options:
-d, --dbname=DBNAME database to dump
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port number
-U, --username=NAME connect as specified database user
-w, --no-password never prompt for password
-W, --password force password prompt (should happen automatically)
--role=ROLENAME do SET ROLE before dump

If no database name is supplied, then the PGDATABASE environment
variable value is used.

Important note: If your database schema relies on OIDs (for instance, as foreign keys) you must instruct pg_dump to dump the OIDs as well. To do this, use the -o command-line option.

Pros and Cons of pg_dump

ProsCons
pg_dump’s output can generally be re-loaded into newer versions of PostgreSQLRestoration can take long due to the SQL statements that have to be executed
 pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server.SQL dump cannot do PITR, it only restore the data as at the time it was backed up
 pg_dump does not block other operations on the database while it is working.Operation that are blocked are operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE.
Can backup single database objects such as tables.

Pg_dumpall
Pg_dumpall is a utility program which is used to back up all database in a given cluster into a script file, it not only backup each database, it also backs up cluster level data such as tablespace and role definitions. Like pg_dump utility pg_dumpall needs read permission on all tables. It is recommended that the user that is executing this utility has the superuser privilege. Please see the usage of the command below:
Pg_dumpall > outputfile

See a list of options below for the pg_dumpall utility.

pg_dumpall --help
Usage:
pg_dumpall [OPTION]...

General options:
-f, --file=FILENAME output file name
-v, --verbose verbose mode
-V, --version output version information, then exit
--lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock
-?, --help show this help, then exit

Options controlling the output content:
-a, --data-only dump only the data, not the schema
-c, --clean clean (drop) databases before recreating
-g, --globals-only dump only global objects, no databases
-o, --oids include OIDs in dump
-O, --no-owner skip restoration of object ownership
-r, --roles-only dump only roles, no databases or tablespaces
-s, --schema-only dump only the schema, no data
-S, --superuser=NAME superuser user name to use in the dump
-t, --tablespaces-only dump only tablespaces, no databases or roles
-x, --no-privileges do not dump privileges (grant/revoke)
--binary-upgrade for use by upgrade utilities only
--column-inserts dump data as INSERT commands with column names
--disable-dollar-quoting disable dollar quoting, use SQL standard quoting
--disable-triggers disable triggers during data-only restore
--if-exists use IF EXISTS when dropping objects
--inserts dump data as INSERT commands, rather than COPY
--no-publications do not dump publications
--no-role-passwords do not dump passwords for roles
--no-security-labels do not dump security label assignments
--no-subscriptions do not dump subscriptions
--no-sync do not wait for changes to be written safely to disk
--no-tablespaces do not dump tablespace assignments
--no-unlogged-table-data do not dump unlogged table data
--quote-all-identifiers quote all identifiers, even if not key words
--use-set-session-authorization
use SET SESSION AUTHORIZATION commands instead of
ALTER OWNER commands to set ownership

Connection options:
-d, --dbname=CONNSTR connect using connection string
-h, --host=HOSTNAME database server host or socket directory
-l, --database=DBNAME alternative default database
-p, --port=PORT database server port number
-U, --username=NAME connect as specified database user
-w, --no-password never prompt for password
-W, --password force password prompt (should happen automatically)
--role=ROLENAME do SET ROLE before dump

If -f/--file is not used, then the SQL script will be written to the standard
output.

Continuous Archiving / Point In Time Recovery

Point in time recovery in PostgreSQL is a recovery method that allows you to restore your database to a specific time in the past.  Before we can do point in time recovery in PostgreSQL we need to enable archiving of the WAL (Write ahead logs). Along with enabling archiving of the WAL, we also need to set the wal_level to replica and have a archiving command in the postgressql.conf file. Below is a list of parameters that needs to be edited in the postgresql.conf file.

Wal_level = archive
# - Archiving
Archive_mode = on
Archive_command = 'cp %p /mnt/server/archivedir/%f'  # Unix

Breakdown of the archive command

%p  is replaced by the path name of the file to archive. The path name is relative to the current working directory (the cluster’s data directory)

%f is replaced by the file name

/mnt/server/archivedir/  – the path to which the WAL files will be copied to.

Translation of the archive command
cp %p /mnt/server/archivedir/%f = cp pg_xlog/00000001000000A900000077 /mnt/server/archivedir/00000001000000A900000077

A similar command will be generated for each new file to be archived.

Executing a PITR backup

Connect to the database as a super user and run the command:

SELECT pg_start_backup('Name Of Backup ');

If you want to start backup as soon as possible and force checkpoints to be done as quickly as possible use the command below:

SELECT pg_start_backup('label', true);

To stop the backup and perform and automatic switch to the next WAL segment execute the code below:

SELECT pg_stop_backup();

Pg_basebackup
Additionally the pg_basebackup utility tool can be used to take a point in time backup.
pg_basebackup -Ft -D /var/lib/postgresql/db_file_backup

PITR pros and cons

ProsCons
You can choose a point in time just before and to which you want to restore your data to.Databases can only be restored in the dame PostgreSQL version, on the same OS. e.g can only restore a Linux 64 bit to a Linux 64 bit
Doesn’t need the database to be off in order to take a backupNo flexibility Single database and objects cannot be backed up alone with this method, only the entire cluster.

In a different article, we will discuss how to restore a database using PITR

PostgreSQL file system level backup

The file system level backup is an alternative backup method which copies the files/directories that PostgreSQL uses to store data. An example of file system level backup:
tar -cf backup.tar /pgsql/data
To use file system level backup the database must be shut down. This might not be practical for high production environments that require little to no downtime. Also it must be noted that when doing a file system level backup, the entire file system has to be backed up. File system level backup cannot backup a specific database or table only, this is because these tables or databases contains files that are not usable without the commit log files, (pg_clog/*) which contains commit status of all transactions.