SQLite: Creating a Hot Data Backup

    SQLite has a mechanism for backing up the database on the fly. Many developers do not know about this, for some reason. This mechanism is primitive, but suitable for many situations. In this article, we would like to discuss this very built-in backup feature, as well as propose an architecture for our own backup mechanism. Well, or at least give a direction in which to move, if you need to arrange complex data replication.



    Generally speaking, you need to start with the simplest option. SQLite database is a single file (by default, DELETE log mode) An application can regularly complete all transactions, close all connections to the database and simply copy the database file to the reserve. If the database file is smaller than 100 MB, then this action on a modern computer will take a couple of seconds. And it can be accelerated - read the file into memory (take a “snapshot”), allow work with the database and, in a separate stream, slowly, dump the contents to a file on disk. Surprisingly enough for many.

    Online Backup API

    However, you cannot copy the base in memory in this way. So, the Online Backup API . This is the SQLite API for creating backups on the fly. Everything is arranged quite simply. The sqlite3_backup_init function starts the backup process:

     sqlite3_backup *sqlite3_backup_init(
       sqlite3 *pDest,                        /* соединение к базе назначения */
       const char *zDestName,                 /* псевдоним базы назначения */
       sqlite3 *pSource,                      /* соединение к исходной базы */
       const char *zSourceName                /* псевдоним исходной базы */
     );
    


    In the parameters, objects of connection to the source database and the destination database are transferred ( “main” for the main database, “temp” for the temporary database or used when connecting via the ATTACH statement are passed as the alias value ). The backup management object is returned (if 0 is returned, then the error must be seen in the connection to the destination database), which must be passed as the first parameter to the remaining API functions. Now you can perform the backup, which is performed page by page. To copy portions from nPage pages (or all, if nPage = -1), you must call the sqlite3_backup_step function :

     int sqlite3_backup_step(sqlite3_backup *p, int nPage);
    

    If this function returned SQLITE_DONE , then the backup is completed, all pages are copied. If the codes SQLITE_OK, SQLITE_BUSY, SQLITE_LOCKED are received , then the copying is not completed, but can be continued normally - you must also call the sqlite3_backup_step () function . Other return codes indicate that an error has occurred. If the SQLITE_DONE code is received , then sqlite3_backup_finish () should be called :

     int sqlite3_backup_finish(sqlite3_backup *p);
    

    and sleep peacefully, rejoicing at a successful backup. To obtain information about the current state of backup, use the functions:

     int sqlite3_backup_remaining(sqlite3_backup *p); // сколько осталось страниц
     int sqlite3_backup_pagecount(sqlite3_backup *p); // сколько всего страниц
    

    The complete algorithm for backing up the Src database in Dst with blocks of SOME_PAGE_COUNT pages on the pseudocode a la pascal looks something like this:

    	Backup = sqlite3_backup_init(Dst, 'main', Src, 'main'); if Backup = nil then Abort;
    	try
    		repeat
    			case sqlite3_backup_step(Backup, SOME_PAGE_COUNT) of
    				SQLITE_DONE: break;
    				SQLITE_OK, SQLITE_BUSY, SQLITE_LOCKED: continue;
    				else
    				 Abort;	
    			end;
    			Write('Осталось ', sqlite3_backup_remaining(Backup) * 100 div sqlite3_backup_pagecount(Backup), '%');
    			Sleep(SOME_TIME);
    		until false;
    	finally
    		sqlite3_backup_finish(Backup)	
    	end;
    

    Using this API, SQLite does not block the source database in any way. You can not only read, but also update the data. What happens if we copy in batches ( nPage > 0, i.e. not all pages at once in one call to sqlite3_backup_step ()) and the database changes? Restart copy! SQLite transparently for us will start copying pages from the very beginning of the database. In other words, if the source base changes rapidly, there is a chance to not wait for the completion of the reservation at all. There is some good news. If the source database is changed through the same connection to it as the backup (even from another thread), then SQLite will transparently duplicate the changes to the destination database and the backup will not restart.

    Advantages of the SQLite Online Backup API - the source database is not blocked for reading, and if it is updated through a single connection, then writing to the database does not interfere. What to do if the database is very large and is often updated from different applications? The time has come to think about creating a data replication system. Actually, nothing new will be invented here - with the help of triggers we track which records have changed and keep a history of changes. We take advantage of the fact that any table contains a ROWID column , which contains a unique record number. We regularly transfer changes to another database. Just show how this is done in detail.

    Simple Data Replication Schema

    So, in the source database, you need to create accounting tables for the records affected by the changes:

     CREATE TABLE IF NOT EXISTS system_replicate_table(name TEXT UNIQUE);
     CREATE TABLE IF NOT EXISTS system_replicate_record(name TEXT, id INTEGER, PRIMARY KEY(name,id) );
    

    To enable replication on the Foo table , add it to system_replicate_table :

     INSERT OR IGNORE INTO system_replicate_table(name) VALUES ('Foo');
    

    and create replication triggers for it:

    
    	function CreateTrigger(const Operation, TableName: string): string;
    	begin
    	  result := Format('CREATE TRIGGER IF NOT EXISTS system_trigger_%s_replicate_%s AFTER %s ON %s FOR EACH ROW BEGIN ', [ TableName, Operation, Operation, TableName ]);
    	  if (Operation = 'INSERT') or (Operation = 'UPDATE') then
    	    result := result + Format('INSERT OR IGNORE INTO system_replicate_record(name, id) VALUES("%s", NEW.ROWID); ', [ TableName ]);
    	  if (Operation = 'DELETE') or (Operation = 'UPDATE') then
    	    result := result + Format('INSERT OR IGNORE INTO system_replicate_record(name, id) VALUES("%s", OLD.ROWID); ', [ TableName ]);
    	  result := result + ' END; ';
    	end;
    	Execute( CreateTrigger('INSERT', 'Foo') ); Execute( CreateTrigger('UPDATE', 'Foo') ); Execute( CreateTrigger('DELETE', 'Foo') );
    

    Triggers are simple. They add the ROWID of the affected table entries to system_replicate_record ( read about NEW and OLD in SQLite triggers yourself) if they are not already there. So, we enable replication according to interesting tables for us and start working with the source database. Data changes are tracked by triggers. At some point (by the time interval or by the number of records in system_replicate_record ), we replicate the data, that is, transfer the changes. How to replicate changed records from Foo table to destination database? This is the most difficult part of replication. We will use the sqlite_master system table, which contains the SQL of all database objects. This SQL is an object creation statement (that is, for the table Foo there will be “CREATE TABLE Foo (...)”).

    Algorithm for copying the Foo table , if it is not already in the Dst database .

    1) Get the SQL table:
     SELECT sql FROM sqlite_master WHERE tbl_name = 'Foo' and type = 'table';
    

    and execute it in the destination database “as is” (we simply pass connections to the destination database to the execute method).
    2) Select all the records and transfer the data (exactly how to transfer a little later):
     SELECT rowid as rowid, * FROM Foo
    

    3) If you need to transfer indexes and triggers too, then we also execute SQL in the destination database, which we get in this way (exclude system indexes and triggers):
     SELECT sql FROM sqlite_master 
     WHERE tbl_name = 'Foo' and (type = "index" or type = "trigger") 
                     and not name LIKE 'system_%' and not name LIKE 'sqlite_%'
    

    The algorithm for replicating the Foo table from the Src database to the Dst database .

    1) If there is no Foo table in Dst yet, then copy it there from Src (see above) and go to 5)
    2) Otherwise, select the ROWID of the affected records:
     SELECT id FROM sqlite_replicate_record WHERE name = 'Foo';
    

    and combine them through a comma into a long string R, i.e. should get in R something like "123,256,334,4700, ..." .
    (Do not even think of doing this operation by adding lines! Select a buffer and expand it as necessary. It is also nice to remember that the ROWID is an 8 byte signed integer) .
    3) Delete these entries from the Foo table in the Dst database :
      DELETE FROM [Foo] WHERE rowid IN (<сюда подставить значение R>);
    

    4) select the data from Foo in the Src database and copy (a little later about actually copying the data) to the Dst database :
     SELECT rowid as rowid, * FROM [Foo] WHERE rowid IN (SELECT id FROM system_replicate_record WHERE name = 'Foo');
    

    5) We clean the replication table in the Src database :
     DELETE FROM sqlite_replicate_record WHERE name = 'Foo';
    

    It remains for us to understand how to copy data. It takes a little more programmatic shamanism. Records are selected by the following query:

     SELECT rowid as rowid, * FROM [Foo]
     [ WHERE ... ]
    

    Only in this way can we guarantee that the ROWID will be retrieved (and will have the name "ROWID"). For each extracted record, we form an SQL insert statement (in UTF-8 encoding):
     INSERT INTO [Foo](<имена>) VALUES(<значения>)
    

    Bypass all the columns in the selected record and add the column name to the " <names> " part, and the value to the " <values >" part, separated by commas. The column name should be framed by '[' and ']'. The value should be represented as an SQL literal. As you know, SQLite has the following value types:
      SQLITE_INTEGER = 1;
      SQLITE_FLOAT   = 2;
      SQLITE_TEXT    = 3;
      SQLITE_BLOB    = 4;
      SQLITE_NULL    = 5;
    

    We need to learn how to get each as an SQL literal. The literal SQLITE_NULL is "null" . The literal SQLITE_INTEGER is a string representation of an integer (64 bits), 1234567890: "1234567890" . The literal SQLITE_FLOAT is a string representation of a real number with a dot as the separator of the fractional and integer parts, 123.456789: “123.456789” . To turn a string ( SQLITE_TEXT ) into a literal, double all single quotes in it and frame the string with single quotes, "Hello, Mc'Duck": "'Hello, Mc''Duck'" . Remained BLOB . SQLITE_BLOB (binary data) literals in SQLite are of the form“X'A0B1C2 ... '” , where “A0” is the hex code of the first byte, “B1” is the hex code of the second byte, etc.

    That's all. We have described the simplest working version of data replication in which a record is copied in its entirety. There is a field for optimization, of course. It will be useful to wrap all changes in the Dst database into a transaction. When forming an insert statement, a part with column names can be created once and reused.

    The architecture shown does not support schema replication. If you change the original table by adding fields to it, this will disrupt its replication. You should either delete the table in the destination database (so that it is completely copied again) or complicate replication by adding synchronization schemes. The same considerations apply to newly created indexes and triggers.

    PS. Use a modern manager to administer the SQLite database.

    Also popular now: