
Restoring a PostgreSQL database from a WAL backup with skipping some records
Introductory
PostgreSQL has such an interesting technical solution - before actually starting to change something in the files of the database itself, the DBMS writes the commands already translated into the internal format into a special journal - Write-Ahead Log, and after the transaction is completed successfully, makes a note in this log. This was done to recover from failures, but in the end, the inquiring mind of the developers came to the idea of using this journal for backup and replication. In principle, it is logical that all moves are recorded in it, moreover, you can not only restore data from backup, but also restore the state of the database at a certain point in time, interrupting the playback of WAL-log entries at the right time.
However, let's look at such a scenario - let's say on Monday you made a basic backup and started archiving WAL logs, on Wednesday you executed a delete request with an erroneous mask, and only found it on Friday when the manager announced the disappearance of some record he needed. In this situation, we can only recover from backup to Wednesday, having lost all the work of managers on Thursday and Friday.
A logical question arises: is it possible to play WAL logs from Monday to Friday, while excluding our “erroneous” request?
In a normal situation, I would confine myself to a question on the forum, but I had 2 FreeBSD distributions, 10 tarballs with PostgreSQL source codes of different versions, 10GB of space on the screw, gcc, two relatively unloaded weeks, as well as tequila, rum, a box of beer and fragmentary memories of C syntax. Not that it was a necessary supply for a solution, but since I looked into the source codes, it’s hard to stop ...
So, for the experiments we took FreeBSD 10 and PostgreSQL 9.2.8 from its ports. The client of the corresponding version can be delivered using pkg, nothing needs to be changed in it. I apologize in advance for possible captaincy, but the text was written both for beginners and in order to quickly refresh everything in my head if necessary, so all the teams are painted in detail.
Installation and basic server setup
root@leninzhiv> cd /usr/ports/databases/postgresql92-server
root@leninzhiv> make fetch
root@leninzhiv> make extract
The downloaded source file is deployed to the work folder in the port directory. I honestly did not understand how to rebuild the sources after the changes, there seems to be no make rebuild, make clean, in turn, simply demolishes this folder with all the changes. So I just copied the work folder to my home directory, made changes there, then copied to the port folder and ran make install.
We’re not changing anything yet, just set the postgres:
root@leninzhiv> make install
Create folders for archives:
root@leninzhiv> mkdir -p /usr/db_archive/wal
root@leninzhiv> mkdir -p /usr/db_archive/data
root@leninzhiv> chown -R pgsql:wheel /usr/pg_archive
Postgres requires that the data directory has access only for the user, therefore we change the rights:
root@leninzhiv> chmod 0700 /usr/pg_archive/data
We do a primitive setup. It makes sense here to switch to the pgsql postgres uchetka so that there is less fuss with file permissions.
root@leninzhiv> su - pgsql
pgsql@leninzhiv> initdb -D /usr/local/pgsql/data
Uncomment and edit the WAL log archiving options in /usr/local/pgsql/data/postgresql.conf:
archive_mode = on
wal_level = archive
archive_command = 'test! -f / usr / db_archive / wal /% f && cp% p / usr / db_archive / wal /% f '
(an example is nearby in the comments)
max_wal_senders = 1
V /usr/local/pgsql/data/pg_hba.conf uncomment the line
local replication pgsql trust
Starting the server
pgsql@leninzhiv> /usr/local/etc/rc.d/postgresql start
Making a basic backup
pgsql@leninzhiv> pg_basebackup -D /usr/db_archive/data/
We check that in the folder / usr / db_archive / data / there should be a copy of the data directory, in / usr / db_archive / wal / there should be WAL files of the form approximately 000000010000000000000003
Copy the configuration file for recovery to the backup folder of the data directory
cp /usr/local/share/postgresql/recovery.conf.sample /usr/db_archive/data/recovery.conf
and uncomment it and edit the restore command (an example is also nearby in the comments).
restore_command = 'cp / usr / db_archive / data /% f% p'
We make entries:
pgsql@leninzhiv> psql -U pgsql -d postgres
postgres=# CREATE TABLE z (z_id serial, z_text character(50));
postgres=# INSERT INTO z (z_text) VALUES ('Karlin');
postgres=# INSERT INTO z (z_text) VALUES ('Petrov');
postgres=# INSERT INTO z (z_text) VALUES ('Ivanov');
postgres=# INSERT INTO z (z_text) VALUES ('Kaplan');
postgres=# INSERT INTO z (z_text) VALUES ('Karas');
postgres=# INSERT INTO z (z_text) VALUES ('Bukova');
postgres=# INSERT INTO z (z_text) VALUES ('Sidorova');
postgres=# INSERT INTO z (z_text) VALUES ('Karman');
postgres=# INSERT INTO z (z_text) VALUES ('Nikolaev');
Delete entries:
postgres=# DELETE FROM z WHERE z_text ILIKE 'Ka%';
Change records, make new ones, disco
postgres=# UPDATE z SET z_text='Petrova' WHERE z_text='Sidorova';
postgres=# INSERT INTO z (z_text) VALUES ('Kruglov');
postgres=# UPDATE z SET z_text='Alexeeva' WHERE z_text='Bukova';
postgres=# INSERT INTO z (z_text) VALUES ('Kvadrat');
We find that deleting mask entries was not a good idea, and together with Karlin we removed Kaplan, Karas and Karman.
We stop the server
pgsql@leninzhiv> /usr/local/etc/rc.d/postgresql stop
pgsql@leninzhiv> exit
root@leninzhiv>
and begin to think what to do.
We go to the source
As you remember, after make extract, I copied the work folder from the port directory to my home folder, and made changes to it. Therefore, we go there. If someone can tell how to make changes in the source code in the port folder itself so that everything is properly rebuilt after the changes made to the code, I will be extremely grateful.
At first, I set myself the goal of finding the place where WAL logs are read from the file.
I found a file with WAL-related code by searching for the “WAL” line in the contents of the files in the work / postgresql-9.2.8 / src directory and common sense, it turned out to be the xlog.c file.
I don’t know how to trace C programs, so it’s just At the beginning of each function, I added a record of its name to the file, collected and launched it.
The result is the following in the file:
bool check_wal_buffers(int *newval, void **extra, GucSource source)
void assign_xlog_sync_method(int new_sync_method, void *extra)
Size XLOGShmemSize(void)
static int XLOGChooseNumBuffers(void)
bool check_wal_buffers(int *newval, void **extra, GucSource source)
void XLOGShmemInit(void)
Size XLOGShmemSize(void)
static void ReadControlFile(void)
void StartupXLOG(void)
static void ReadControlFile(void)
static char * str_time(pg_time_t tnow)
static void ValidateXLOGDirectoryStructure(void)
static void readRecoveryCommandFile(void)
static List * readTimeLineHistory(TimeLineID targetTLI)
static bool read_backup_label(XLogRecPtr *checkPointLoc, bool *backupEndRequired, bool *backupFromStandby)
static XLogRecord * ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt)
static XLogRecord * ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess)
static int XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources)
...
static XLogRecord * ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess)
static bool RecordIsValid(XLogRecord *record, XLogRecPtr recptr, int emode)
static bool recoveryStopsHere(XLogRecord *record, bool *includeThis)
static void CheckRecoveryConsistency(void)
static XLogRecord * ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess)
...
In general, I got the impression that the main action takes place in the ReadRecord -> XLogPageRead -> RecordIsValid -> RecoveryStopsHere -> CheckRecoveryConsistency loop.
A closer acquaintance with the ReadRecord function showed that it returns a record in two places - as return record and as return (XLogRecord *) buffer, we clarify in the above simple way that in the process of recovering from WAL logs, the return goes through return (XLogRecord *) buffer. Perfectly! We write the result to a file.
A structure of type XLogRecord can be viewed in the xlog.h file and it is quite concise:
typedef struct XLogRecord
{
pg_crc32 xl_crc; /* CRC for this record */
XLogRecPtr xl_prev; /* ptr to previous record in log */
TransactionId xl_xid; /* xact id */
uint32 xl_tot_len; /* total len of entire record */
uint32 xl_len; /* total len of rmgr data */
uint8 xl_info; /* flag bits, see below */
RmgrId xl_rmid; /* resource manager for this record */
/* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */
} XLogRecord;
Well, if we have a length, then we use it to output the contents of the record to a file, before return (XLogRecord *) buffer add:
FILE *pf2 = fopen("/usr/local/pgsql/data/log3.txt", "a"); char *buf_poi = buffer;
for (uint32 i=0; i < record->xl_tot_len; i++) {fputc(*buf_poi, pf2); buf_poi++;}
fprintf(pf2, "\n crc32: %u \n xl_xid=%i \n", record->xl_crc, record->xl_xid);
fclose(pf2);
We demolish the old Postgres, assemble and install a new one:
root@leninzhiv> cd /usr/ports/databases/postgresql92-server
root@leninzhiv> make deinstall
I remind you that we copied the work directory to the home folder and made all the code changes there. Now copy it to the place of the work folder in the port directory.
root@leninzhiv> rm -R /usr/ports/databases/postgresql92-server/work
root@leninzhiv> cp -R ~/work /usr/ports/databases/postgresql92-server/work
root@leninzhiv> make install
We delete the database files and copy the base backup to their place. WAL files themselves will catch up.
root@leninzhiv> su - pgsql
pgsql@leninzhiv> /usr/local/etc/rc.d/postgresql stop
pgsql@leninzhiv> rm -R /usr/local/pgsql/data
pgsql@leninzhiv> cp -R /usr/db_archive/data /usr/local/pgsql/data
pgsql@leninzhiv> /usr/local/etc/rc.d/postgresql start
pgsql@leninzhiv> psql -U pgsql -d postgres
postgres=# select * from z;
postgres=# \q
pgsql@leninzhiv> /usr/local/etc/rc.d/postgresql stop
We look at the contents of the log3.txt file, first we go a lot of large records, apparently the creation of service tables and data, closer to the end we see:
Г#{Ы####РT##к###r###R####
##########0###@###### #e######## ###gNikolaev
crc32: 3682278083
l_xid=1002
W#
М#####U##к###,###
###`#######Т›ЩЌ%ћ######
crc32: 3423214679
xl_xid=1002
r"Х ####xU##л###5########
##########0###@########Я##
crc32: 2698322546
xl_xid=1003
#Щ%2####ЁU##л###5########
##########0###@########Я##
crc32: 841341184
xl_xid=1003
ь#Wз####аU##л###5########
##########0###@########Я##
crc32: 3881244668
xl_xid=1003
Z7#р#####V##л###5########
##########0###@########Я##
crc32: 4028315482
xl_xid=1003
µЄЈђ####PV##л###,###
###`########ЄЩЌ%ћ######
crc32: 2426645173
xl_xid=1003
Уњ-B####€V##м###y###Y###@
##########0###@########I#####
####Ђ#(######gPetrova
crc32: 1110285523
xl_xid=1004
We see that between the familiar names of Nikolaev and Petrov there are 4 similar entries and one unlike, under the same transaction number. Apparently, these are deletion commands, which means that commands such as "delete line 50 in table 64822" are already written to the WAL log. In principle, as expected. We add a check that, with xl_xid = 1003, returns NULL instead of writing.
Again, delete the old Postgres, collect and install a new one, start recovery ...
Deleted records are in place! True, everything that was supposed to happen after the deletion did not happen :( Well, I didn’t succeed in taking it off. In general, it’s understandable, because before playing the recording, integrity checks and all that are done.
So the goal number 2 is to find where the “playback” of the recording is going. A quick search for the use of readRecord in the same file led me to the void StartupXLOG (void) function ... And here I clearly understood that until now I had gone the wrong way, because almost immediately after the second or third appearance of the readRecord call in this function (they are there next) immediately comes the first chic diagnostic piece, and secondly, immediately after the comment “Now apply the WAL record itself” - the command to lose the record RmgrTable [record-> xl_rmid] .rm_redo (EndRecPtr, record);
Change this piece of code to
if (record->xl_xid==1003)
{}
else RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record);
We reassemble, launch, check again ... Victory! Deleted records in place and changes made after deletion are also in place!
We focus on the terrain
Well, this is undoubtedly good, but we solved the problem with an extremely limited data set, but how to find the necessary record in the logs of the working database?
Let's get back to the chic diagnostic piece mentioned in the StartupXLOG function:
#ifdef WAL_DEBUG
if (XLOG_DEBUG ||
(rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
(rmid != RM_XACT_ID && trace_recovery_messages <= DEBUG3))
{
StringInfoData buf;
initStringInfo(&buf);
appendStringInfo(&buf, "REDO @ %X/%X; LSN %X/%X: ",
ReadRecPtr.xlogid, ReadRecPtr.xrecoff,
EndRecPtr.xlogid, EndRecPtr.xrecoff);
xlog_outrec(&buf, record);
appendStringInfo(&buf, " - ");
RmgrTable[record->xl_rmid].rm_desc(&buf,
record->xl_info,
XLogRecGetData(record));
elog(LOG, "%s", buf.data);
pfree(buf.data);
}
#endif
You can simply include the output in the logs by uncommenting #define WAL_DEBUG in pg_config_manual.h and adding wal_debug = on to the postgresql.conf file, but out of habit, I sent the output to a separate file. This piece, as I understand it, displays the description of the command using the rm_desc function (in this case, RmgrTable is an array of functions?), It looks something like this:
REDO @ 0/3015500; LSN 0/3015578: prev 0/30154D0; xid 1002; len 82: Heap - insert: rel 1663/12318/16386; tid 0/9
REDO @ 0/3015578; LSN 0/30155A8: prev 0/3015500; xid 1002; len 12: Transaction - commit: 2014-06-06 08:38:27.537874+00
REDO @ 0/30155A8; LSN 0/30155E0: prev 0/3015578; xid 1003; len 21: Heap - delete: rel 1663/12318/16386; tid 0/1
REDO @ 0/30155E0; LSN 0/3015618: prev 0/30155A8; xid 1003; len 21: Heap - delete: rel 1663/12318/16386; tid 0/4
REDO @ 0/3015618; LSN 0/3015650: prev 0/30155E0; xid 1003; len 21: Heap - delete: rel 1663/12318/16386; tid 0/5
REDO @ 0/3015650; LSN 0/3015688: prev 0/3015618; xid 1003; len 21: Heap - delete: rel 1663/12318/16386; tid 0/8
REDO @ 0/3015688; LSN 0/30156B8: prev 0/3015650; xid 1003; len 12: Transaction - commit: 2014-06-06 08:38:27.54153+00
REDO @ 0/30156B8; LSN 0/3015738: prev 0/3015688; xid 1004; len 89: Heap - hot_update: rel 1663/12318/16386; tid 0/7; new 0/10
This is a piece we already know with transaction number 1003, and from it we can see that yes, these are four delete commands and one transaction confirmation. In the delete commands, we see rel - the identifier of the table in the format "oid namespace / oid database / oid table". The corresponding digits can be obtained by
SELECT oid, spcname FROM pg_catalog.pg_tablespace;
SELECT oid, datname FROM pg_catalog.pg_database;
and, suddenly,
SELECT oid, relname FROM pg_catalog.pg_class;
The second guideline is that there is a time stamp in the description of the transaction. Well, there is no need to explain anything here, if we know when this same crime was commited, then we will find the corresponding records.
Well, as an alternative way, you can return to viewing records in krakozyabra, and navigate through bits of text that were passed as parameters to the INSERT and UPDATE commands, if we remember the queries with which parameters were made shortly before or after the desired "erroneous" request. In the case of UPDATE, however, you can find only those lines that were used as the new value, if the line was used to search for records, then it does not occur in WAL logs.
And finally, I can note that in the PostgreSQL 9.3 counter, the pg_xlogdump utility appeared, which seems to be aimed at solving the problem of providing WAL-log content in a human-readable form. If you are interested in some features then it makes sense to write to the developers.
It is possible that using this method on the archives of a working database will have some pitfalls. For example, how will UPDATEs work if we “skip” deleting part of records in a database that uses frequent evacuation? I did not check. But in any case, in the case it is better to have at least some hope to correct the error than not at all.