servitRM March 22, 2013 at 10:01

New in Caché DBMS 2013.1: Adding and Generating Indexes on Live Classes

Tutorial

Suppose you have a table with a large number of records and you need to add one or more indexes to it with the following conditions:

their generation should be as fast as possible
so that the generation can be done in batches.
For example, if there is a table for 300M records and you can work with it only after hours, then you can break the whole process into three nights for 100M records
the appearance of new indexes and the process of their generation should not interfere with the current work with the class / table

To do this, one could use the already known % BuildIndices () method , but in this case it would not satisfy our conditions.

What is the way out?

Theory

A new % Library.IndexBuilder class has been added to Caché 2013.1, with one but powerful % ConstructIndicesParallel () method .
From the name it already becomes clear that the generation will occur in parallel with the involvement of all processor cores.

So, let's consider the parameters of this method in more detail:

ClassMethod% ConstructIndicesParallel (pTaskId = "", pStartId As% Integer = 0, pEndId As% Integer = -1, pSortBegin As% Integer = 1, pDroneCount As% Integer = 0, pLockFlag As% Integer = 1, pJournalFlag As% Boolean = 1) as% Status

pTaskId - ID of the background process. Leave blank / undefined for interactive call
pStartId - ID from which to start the generation. Default 1
pEndId - ID on which to complete the generation. The default is -1, meaning the last ID in the table
pSortbegin - 1/0 flag that determines whether to use $ SortBegin when generating.
pDroneCount - the number of background processes for generating indexes.
The default is 0. In this case, the code will independently determine the optimal number of processes, based on the number of available cores / processors and the number of processed records.
pLockFlag - a flag defining the behavior of the lock during generation:
- 0 = No lock
- 1 = Extent locking - Gets an exclusive lock on the entire extent during generation
- 2 = Row level locking - Gets a shared lock on each row processed and the index node for the item. When index generation for a particular row is completed, the row is immediately locked.
pJournalFlag - 0/1 flag that defines the use of logging:
1 - index generation will be logged, 0 - will not.

Practice

Now consider an example of applying a new class.

First, create a training class in the USER area , fill it with 1M records with variable-length strings [1-100] and construct the index using the classic % BuildIndices () , so that there is something to compare:

Class demo.test Extends% Persistent
{

Index idxn On n As SQLUPPER (6);

Property n As% String (MAXLEN = 100);

ClassMethod Fill (n As% Integer = 10000000)
{
  set data = $ Replace ($ Justify ("", 100), "", "a")
  set time = $ ZHorolog
  do DISABLE ^% NOJRN
  do ..% KillExtent ()
  set ^ demo.testD = n
  set ^ demo.testD (1) = $ ListBuild ("", $ Extract (data, 1, $ Random (100) +1))
  for i = 2: 1: n set ^ (i ) = $ ListBuild ("", $ Extract (data, 1, $ Random (100) +1))
  do ENABLE ^% NOJRN
  write "insert =", $ ZHorolog-time, "sec." ,!
}

ClassMethod BIndex ()
{
  set time = $ ZHorolog
  do ..% BuildIndices (, 1,1)
  write "reindexing =", $ ZHorolog-time, "sec." ,!
}

}

My results:
USER>do ##class(demo.test).Fill()
вставка= 9.706935 сек.

USER>do ##class(demo.test).BIndex()
переиндексация= 71.966953 сек.

Now we use the new % IndexBuilder class . To do this, follow these steps:

first, clear the index data from the previous test using the % PurgeIndices () method (optional step)
inherit our class from % IndexBuilder
write a list of indices separated by commas in the INDEXBUILDERFILTER parameter .
If this parameter is left blank, then all indexes will be regenerated
make our index invisible to SQL so that the optimizer does not use an index that is not yet ready for operation.
To do this, we use the $ SYSTEM.SQL.SetMapSelectability () method :

ClassMethod SetMapSelectability (pTablename As% Library.String = "", pMapname As% Library.String = "", pValue As% Boolean = "") as% Library.String

Description arguments:
- pTablename - table name
- pMapname - index name
- pValue - 0/1 flag that determines the visibility (1) or invisibility (0) of the index for the SQL optimizer
Note: You can make an index invisible long before it is added to the class.
call the % ConstructIndicesParallel () method
make our index visible to SQL
Profit!

As a result, our class will take the following form:

Class demo.test Extends (% Persistent,% IndexBuilder)
{

Parameter INDEXBUILDERFILTER = "idxn";

Parameter BITMAPCHUNKINMEMORY = 0;

Index idxn On n As SQLUPPER (6);

Property n As% String (MAXLEN = 100);
...
ClassMethod FastBIndex ()
{
  do ..% PurgeIndices ($ ListBuild ("idxn"))
  do $ SYSTEM.SQL.SetMapSelectability ($ classname (), "idxn", $$$ NO)
  do ..% ConstructIndicesParallel (,, , 1
  ,, 2,0) do $ SYSTEM.SQL.SetMapSelectability ($ classname (), "idxn", $$$ YES)
}
}

My results:
USER>do ##class(demo.test).FastBIndex()

Building 157 chunks and will use parallel build algorithm with 4 drone processes.
SortBegin is requested.
Started drone process: 3812
Started drone process: 4284
Started drone process: 7004
Started drone process: 7224
Expected time to complete is 43 secs to build 157 chunks of 64,000 objects using 4 processes.
Waiting for processes to complete....done.
Elapsed time using 4 processes was 34.906643.

As you can see, the speed has doubled.

On your hardware and on your data, the results can be even better.

Even faster?

But is there an opportunity to further accelerate the regeneration of indexes?
If you have plenty of RAM left, then yes.

In the process of generating indices, the constructor for internal needs temporarily forms the so-called bitmap blocks. By default, they are written to private globals, but using the BITMAPCHUNKINMEMORY boolean parameter, you can specify that they form in RAM. To do this, set the parameter to 1.
Note that if RAM is allocated a little and indexes are large, then you may get an error.
By default, BITMAPCHUNKINMEMORY is 0.

Tags:

New in Caché DBMS 2013.1: Adding and Generating Indexes on Live Classes

Theory

Practice

Even faster?

Also popular now: