Restore individual pages in a database
Gail Shaw 's article “Help, my database is corrupt. Now what? ” , The translation of which I posted last week, aroused, it seems, a certain interest, but, alas, it did not contain“ practice ”. Yes, it says how to save data, but there are no examples.
Initially, I wanted to make another translation of the same author, but, thinking, I decided to write a post “on my own”, as if “based on motives”. The reasons that prompted me to do so, I will describe at the end of the post, in the notes.
Database Recovery in SQL Server
As already mentioned in the previous article, if the clustered index or heap pages are damaged, then the data contained on these pages is lost and the only option for restoring them is directly restoring the database.
SQL Server provides many database recovery features. Firstly, it is the restoration of the entire database - it can take a lot of time (it depends on the size of the database and the speed of the hard drives). Secondly, the restoration of individual file groups, or files, if your database consists of several file groups (or, respectively, files). In this case, it is possible to restore only damaged parts of the database without affecting the rest. These two types of database recovery are used quite often and will not be affected in the future.
Thirdly, in SQL Server 2005 it became possible to restore individual database pages - in this case only the specified pages will be restored from the backup. Such recovery will be especially relevant if DBCC CHECKDB finds several corrupted pages in some huge table “lying” in a hefty file. Due to the fact that not the entire file will be restored, and not even the entire table, but only a few pages - downtime can be significantly reduced.
Requirements and Limitations
Recovery Model and Transaction Log Backup Availability
The most important thing to remember is to restore individual pages, the database must use the full (full) recovery model, or the recovery model with incomplete logging (bulk-logged). If your databases are in a simple recovery model, then you can no longer read further.
The second requirement is that your complete backups and backups of the transaction log must form an inextricable chain of logs ( log chain) If you never run the BACKUP LOG WITH TRUNCATE_ONLY (NO_LOG) command and do not switch to a simple recovery model in order to reduce the transaction log, and you have ALL transaction log backups since the last full backup containing no corrupted pages (including this the most complete backup) - you don’t have to worry about the chain of magazines.
In the recovery model with incomplete logging, theoretically, the recovery of individual pages should work fine if the conditions described above are met and the pages being restored are not changed by operations performed with minimal logging .
SQL Server Editions
Page recovery is possible in any edition of SQL Server, but for Editions Enterprise Edition and Developer Edition it is possible to recover damaged pages on-line, i.e. you can access the database during recovery (and moreover, you can even refer to the table to which the pages currently being restored belong, if these pages themselves are not “affected” - otherwise, the request will fail). For the editions below the Enterprise Edition, page recovery takes place off-line and the database, during recovery, becomes unavailable.
Damaged Page Type
In the event that index or data pages are damaged, their recovery is possible online in the Enterprise Edition.
Pages that attach to critical system tables can be restored, but the database, when restored, will not be available in any edition of SQL Server.
“Placement cards” cannot be restored “separately”. If GAM, SGAM, PFS, ML, DIFF pages are damaged, you need to restore the entire database. The only exceptions are IAM pages. Although they refer to “sitemaps,” they describe only one table, not the entire database, and their recovery is possible.
The database loading page (9th page in the 1st database file) and the file header page (0th page in each file) cannot be restored separately, if they are damaged, the entire database will have to be restored.
Now, finally, we move from theory to practice.
First of all, for training, you need a damaged database.
For experimentation, I will use the AdventureWorks database, which comes with SQL Server. If you did not install it, if you wish, you can download it here . I transfer it to the full recovery model:
I am convinced that there are no errors in it yet:
ALTER DATABASE AdventureWorks SET RECOVERY FULL
and create a full backup:
DBCC CHECKDB('AdventureWorks') WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
BACKUP DATABASE AdventureWorks TO DISK = 'D:\tmp\aw_backups\aw_full_ok1.bak'
In this database, I create a crash table.
We will spoil the varchar type field in order to check what will happen if suddenly SQL Server finds in it not the data that he himself wrote there.
CREATE TABLE crash (txt varchar(1000))
Before you spoil something, you need to fill it with something. I hammer the left data into the created table.
Now I’m backing up the transaction log:
SET NOCOUNT ON DECLARE @i INT SET @i = 1 WHILE @i<100000 BEGIN INSERT INTO crash SELECT REPLICATE('a', 1000) SET @i = @i + 1 END SET NOCOUNT OFF
BACKUP LOG AdventureWorks TO DISK = 'D:\tmp\aw_backups\aw_log_ok1.trn'
Now let's change the data a bit:
So, everything is ready. We disconnect the database and open the mdf-file with FAR (or whatever is more convenient for you), look for the string “zzzzzzz” in it and replace several 'z' with arbitrary characters:
Now that the database is corrupted, connect it. And, yes, I remember that in the previous article it was clearly said that it is not worth disconnecting / attaching a database. But in our case it is absolutely “safe” - the database in “suspect” will not fall.
Looking for errors
So, the damaged database has successfully returned to operation. Run the integrity check again:
As a result, what we were waiting for ( be sure to remember the numbers of the damaged pages! ): In this case, the data itself on the heap (index id = 0) is damaged, so SQL Server will not be able to recover this data. Now we have three options:
DBCC CHECKDB('AdventureWorks') WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
Msg 8928, Level 16, State 1, Line 1
Object ID 1883153754, index ID 0, partition ID 72057594054246400, alloc unit ID 72057594061651968 (type In-row data): Page (1:20455) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 1883153754, index ID 0, partition ID 72057594054246400, alloc unit ID 72057594061651968 (type In-row data), page (1:20455). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 29493257 and -4.
CHECKDB found 0 allocation errors and 2 consistency errors in table 'crash' (object ID 1883153754).
CHECKDB found 0 allocation errors and 2 consistency errors in database 'AdventureWorks'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (AdventureWorks).
- Deal with data loss and run DBCC CHECKDB ('AdventureWorks', REPAIR_ALLOW_DATA_LOSS)
- Make a backup of the active part of the transaction log and restore the entire database - as a result, there will be no data loss, but it will take a long time
- Make a backup of the active part of the transaction log and restore only one (!), Damaged page
Recover damaged page
First of all, we need to backup the final fragment of the transaction log ( tail backup ). At the same time, if you have Enterprise Edition, you can not add the NORECOVERY parameter, which will put the database in the "restoring" state, since this edition supports on-line page recovery. Moreover, if you have transaction log backups performed on a regular basis so as not to disrupt the log chain, in the Enterprise Edition, you can make a COPY_ONLY backup.
I follow the path of off-line recovery and do:
BACKUP LOG AdventureWorks TO DISK = 'D:\tmp\aw_backups\aw_log_fail3.trn' WITH NORECOVERY
Now, you can repair the damaged page. First of all, we use the full backup (aw_full_ok1.bak):
RESTORE DATABASE AdventureWorks PAGE = '1:20455' FROM DISK = 'D:\tmp\aw_backups\aw_full_ok1.bak' WITH NORECOVERY
As a result, we have:
Please note that it is necessary to use the NORECOVERY option, since we still have to roll back the transaction logs onto it.
RESTORE LOG AdventureWorks FROM DISK = 'D:\tmp\aw_backups\aw_log_ok1.trn' WITH NORECOVERY
RESTORE LOG AdventureWorks FROM DISK = 'D:\tmp\aw_backups\aw_log_fail3.trn' WITH RECOVERY
It seems that everything went well, we launch DBCC CHECKDB and ...
Recovery was successful.
Please note that downtime is reduced due to the fact that we do not restore the entire database from the full backup, but only damaged pages (if I restored the entire backup, the backup would recover in 8.5 seconds, but the larger the database, the there will be a time difference). Lucky with SQL Server Enterprise Edition, using on-line recovery, will also save time on recovery from log backups, and off-line recovery, alas, the entire logs will be processed.
It is also worth adding that in SQL Server 2005, 2008, 2008 R2, restoring a single page is only possible using T-SQL; Denali now has the ability to do this through the GUI.
But what if DBCC CHECKDB?
Just in case, I decided to check what SQL Server will do if I run DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS parameter. All the same error:
First, put the database in SINGLE_USER mode:
And then, we start recovery:
ALTER DATABASE AdventureWorks SET SINGLE_USER
Bottom line: Yeah, SQL Server deleted the “damaged” page. We put the database in MULTI_USER mode so that it becomes accessible to everyone and check what is missing: Considering that the page size in SQL Server is 8KB, and for user data it’s a little smaller - everything is natural, the table "lost" 7 records (at the beginning of them was 99999). Since there was no clustered index on this table, the data could be stored in random order, i.e. we could not even find out what data would be lost.
DBCC CHECKDB('AdventureWorks', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
Repair: The page (1:20455) has been deallocated from object ID 1883153754, index ID 0, partition ID 72057594054246400, alloc unit ID 72057594061651968 (type In-row data).
So why, after all, not a translation?
So why is it still not a translation, but a post “based on motive”. The fact is that, in the public domain, there is no article “Page Restore” by Gail Shaw. There is such a section in the SQL Server MVP book Deep Dives vol.2, which sells for pretty tangible money (but, of course, is easily found on the Internet) and I'm not sure that publishing a translation is um ... right or something.
In general, I read the article, took note of the main points, and then I wrote the text myself and, along the way, conducted an experiment on restoration. I hope this experience was useful to someone.
And, gentlemen, I sincerely hope that if you decide to repeat this experiment, you will be extremely careful (for example, you will not experiment with the main database on the production server). Remember that I do not bear any responsibility for your actions.