Point In Time Recovery

Restore a Backup in Quick Start performed default recovery, which is to play all the way to the end of the WAL stream. In the case of a hardware failure this is usually the best choice but for data corruption scenarios (whether machine or human in origin) Point-in-Time Recovery (PITR) is often more appropriate.

Point-in-Time Recovery (PITR) allows the WAL to be played from the last backup to a specified time, transaction id, or recovery point. For common recovery scenarios time-based recovery is arguably the most useful. A typical recovery scenario is to restore a table that was accidentally dropped or data that was accidentally deleted. Recovering a dropped table is more dramatic so that's the example given here but deleted data would be recovered in exactly the same way.

# pg-primary ⇒ Backup the demo cluster and create a table with very important data
sudo -u postgres pgbackrest --stanza=demo --type=diff backup
sudo -u postgres psql -c "begin; \
       create table important_table (message text); \
       insert into important_table values ('Important Data'); \
       commit; \
       select * from important_table;"
    message
----------------
 Important Data
(1 row)

It is important to represent the time as reckoned by PostgreSQL and to include timezone offsets. This reduces the possibility of unintended timezone conversions and an unexpected recovery result.

# pg-primary ⇒ Get the time from PostgreSQL
sudo -u postgres psql -Atc "select current_timestamp"
2018-05-06 15:17:58.783787+00

Now that the time has been recorded the table is dropped. In practice finding the exact time that the table was dropped is a lot harder than in this example. It may not be possible to find the exact time, but some forensic work should be able to get you close.

# pg-primary ⇒ Drop the important table
sudo -u postgres psql -c "begin; \
       drop table important_table; \
       commit; \
       select * from important_table;"
ERROR:  relation "important_table" does not exist
LINE 1: ...le important_table;     commit;     select * from important_...
                                                             ^

Now the restore can be performed with time-based recovery to bring back the missing table.

# pg-primary ⇒ Stop PostgreSQL, restore the demo cluster to 2018-05-06 15:17:58.783787+00, and display recovery.conf
sudo pg_ctlcluster 9.4 demo stop
sudo -u postgres pgbackrest --stanza=demo --delta \
       --type=time "--target=2018-05-06 15:17:58.783787+00" restore
sudo -u postgres cat /var/lib/postgresql/9.4/demo/recovery.conf
restore_command = 'pgbackrest --stanza=demo archive-get %f "%p"'
recovery_target_time = '2018-05-06 15:17:58.783787+00'

The recovery.conf file has been automatically generated by pgBackRest so PostgreSQL can be started immediately. Once PostgreSQL has finished recovery the table will exist again and can be queried.

# pg-primary ⇒ Start PostgreSQL and check that the important table exists
sudo pg_ctlcluster 9.4 demo start
sudo -u postgres psql -c "select * from important_table"
    message
----------------
 Important Data
(1 row)

The PostgreSQL log also contains valuable information. It will indicate the time and transaction where the recovery stopped and also give the time of the last transaction to be applied.

# pg-primary ⇒ Examine the PostgreSQL log output
sudo -u postgres cat /var/log/postgresql/postgresql-9.4-demo.log
LOG:  database system was interrupted; last known up at 2018-05-06 15:17:54 UTC
LOG:  starting point-in-time recovery to 2018-05-06 15:17:58.783787+00
LOG:  restored log file "00000004.history" from archive
LOG:  restored log file "000000040000000000000019" from archive
LOG:  redo starts at 0/19000028
LOG:  consistent recovery state reached at 0/190000F0
LOG:  restored log file "00000004000000000000001A" from archive
LOG:  recovery stopping before commit of transaction 689, time 2018-05-06 15:17:58.990914+00
LOG:  redo done at 0/1A016E48
LOG:  last completed transaction was at log time 2018-05-06 15:17:58.563866+00
LOG:  incomplete startup packet
LOG:  selected new timeline ID: 5
       [filtered 5 lines of output]

This example was rigged to give the correct result. If a backup after the required time is chosen then PostgreSQL will not be able to recover the lost table. PostgreSQL can only play forward, not backward. To demonstrate this the important table must be dropped (again).

# pg-primary ⇒ Drop the important table (again)
sudo -u postgres psql -c "begin; \
       drop table important_table; \
       commit; \
       select * from important_table;"
ERROR:  relation "important_table" does not exist
LINE 1: ...le important_table;     commit;     select * from important_...
                                                             ^

Now take a new backup and attempt recovery from the new backup.

# pg-primary ⇒ Perform a backup then attempt recovery from that backup
sudo -u postgres pgbackrest --stanza=demo --type=incr backup
sudo pg_ctlcluster 9.4 demo stop
sudo -u postgres pgbackrest --stanza=demo --delta \
       --type=time "--target=2018-05-06 15:17:58.783787+00" restore
sudo pg_ctlcluster 9.4 demo start
sudo -u postgres psql -c "select * from important_table"
ERROR:  relation "important_table" does not exist
LINE 1: select * from important_table
                      ^

Looking at the log output it's not obvious that recovery failed to restore the table. The key is to look for the presence of the "recovery stopping before..." and "last completed transaction..." log messages. If they are not present then the recovery to the specified point-in-time was not successful.

pg-primary ⇒ Examine the PostgreSQL log output to discover the recovery was not successful
sudo -u postgres cat /var/log/postgresql/postgresql-9.4-demo.log
LOG:  database system was interrupted; last known up at 2018-05-06 15:18:09 UTC
LOG:  starting point-in-time recovery to 2018-05-06 15:17:58.783787+00
LOG:  restored log file "00000005.history" from archive
LOG:  restored log file "00000005000000000000001B" from archive
LOG:  redo starts at 0/1B000028
LOG:  consistent recovery state reached at 0/1B0000F0
LOG:  redo done at 0/1B0000F0
LOG:  incomplete startup packet
       [filtered 10 lines of output]

Using an earlier backup will allow PostgreSQL to play forward to the correct time. The info command can be used to find the next to last backup.

# pg-primary ⇒ Get backup info for the demo cluster
sudo -u postgres pgbackrest info
stanza: demo
    status: ok

    db (current)
        wal archive min/max (9.4-1): 00000002000000000000000B / 00000005000000000000001B

        full backup: 20180506-151655F
            timestamp start/stop: 2018-05-06 15:16:55 / 2018-05-06 15:17:02
            wal start/stop: 00000002000000000000000B / 00000002000000000000000B
            database size: 19.2MB, backup size: 19.2MB
            repository size: 2.2MB, repository backup size: 2.2MB

        full backup: 20180506-151702F
            timestamp start/stop: 2018-05-06 15:17:02 / 2018-05-06 15:17:08
            wal start/stop: 00000002000000000000000C / 00000002000000000000000C
            database size: 19.2MB, backup size: 19.2MB
            repository size: 2.2MB, repository backup size: 2.2MB

        diff backup: 20180506-151702F_20180506-151724D
            timestamp start/stop: 2018-05-06 15:17:24 / 2018-05-06 15:17:27
            wal start/stop: 000000020000000000000014 / 000000020000000000000014
            database size: 19.2MB, backup size: 8.2KB
            repository size: 2.2MB, repository backup size: 400B
            backup reference list: 20180506-151702F

        incr backup: 20180506-151702F_20180506-151737I
            timestamp start/stop: 2018-05-06 15:17:37 / 2018-05-06 15:17:42
            wal start/stop: 000000030000000000000016 / 000000030000000000000016
            database size: 31.7MB, backup size: 12.6MB
            repository size: 3.7MB, repository backup size: 1.5MB
            backup reference list: 20180506-151702F
        diff backup: 20180506-151702F_20180506-151753D
            timestamp start/stop: 2018-05-06 15:17:53 / 2018-05-06 15:17:58
            wal start/stop: 000000040000000000000019 / 000000040000000000000019
            database size: 25.5MB, backup size: 6.3MB
            repository size: 3MB, repository backup size: 781KB
            backup reference list: 20180506-151702F

        incr backup: 20180506-151702F_20180506-151807I
            timestamp start/stop: 2018-05-06 15:18:07 / 2018-05-06 15:18:11
            wal start/stop: 00000005000000000000001B / 00000005000000000000001B
            database size: 25.5MB, backup size: 1.7MB
            repository size: 3MB, repository backup size: 201.5KB
            backup reference list: 20180506-151702F, 20180506-151702F_20180506-151753D

The default behavior for restore is to use the last backup but an earlier backup can be specified with the --set option.

# pg-primary ⇒ Stop PostgreSQL, restore from the selected backup, and start PostgreSQL
sudo pg_ctlcluster 9.4 demo stop
sudo -u postgres pgbackrest --stanza=demo --delta \
       --type=time "--target=2018-05-06 15:17:58.783787+00" \
       --set=20180506-151702F_20180506-151753D restore
sudo pg_ctlcluster 9.4 demo start
sudo -u postgres psql -c "select * from important_table"
    message
----------------
 Important Data
(1 row)

Now the the log output will contain the expected recovery stopping before... and last completed transaction... messages showing that the recovery was successful.

# pg-primary ⇒ Examine the PostgreSQL log output for log messages indicating success
sudo -u postgres cat /var/log/postgresql/postgresql-9.4-demo.log
LOG:  database system was interrupted; last known up at 2018-05-06 15:17:54 UTC
LOG:  starting point-in-time recovery to 2018-05-06 15:17:58.783787+00
LOG:  restored log file "00000004.history" from archive
LOG:  restored log file "000000040000000000000019" from archive
LOG:  redo starts at 0/19000028
LOG:  consistent recovery state reached at 0/190000F0
LOG:  restored log file "00000004000000000000001A" from archive
LOG:  recovery stopping before commit of transaction 689, time 2018-05-06 15:17:58.990914+00
LOG:  redo done at 0/1A016E48
LOG:  last completed transaction was at log time 2018-05-06 15:17:58.563866+00
LOG:  incomplete startup packet
LOG:  restored log file "00000005.history" from archive
       [filtered 8 lines of output]