stalkerg December 4, 2015 at 12:08

Access tables from C extensions for Postgres

Tutorial

Hello!

This time I’m not talking about using Python or another CSS / HTML trick and, alas, not about how I ported Wanger for 5 years, but about one important aspect of writing extensions for the great PostgresSQL DBMS.

In fact, there are already a lot of articles on how to write an extension for Postgres in C (for example, this one ), including in Russian . But, as a rule, they describe fairly simple cases. In these articles and instructions, authors implement functions that receive data as input, process it somehow, and then return a single number, string, or user-defined type. They do not explain what to do if you need to run through C code using a regular table existing in the database or an index.

The tables from C can be accessed through the well-described but slow SPI (Server Programming Interface) , there is also a very complicated way, through buffers, and I will talk about a compromise option. Under the cut, I tried to give code examples with detailed explanations.

The basics

I assume that you already wrote your simple functions, and saw that they are declared in a tricky way:

Datum to_regclass(PG_FUNCTION_ARGS);

and you can’t just call such a function. Let's take a quick look at the to_regclass function example :

Datum my_index_oid_datum = DirectFunctionCall1(to_regclass, CStringGetDatum("my_index"));
Oid my_index_oid = DatumGetObjectId(my_index_oid_datum);

In this code, I call the to_regclass function , using a macro, to convert the name of a database object (index, table, etc.) to its Oid (unique directory number). This function has only one argument, so a macro with the speaking name DirectFunctionCall1 has a one at the end. Such macros up to 9 arguments are declared in the include / fmgr.h file. The arguments themselves are always represented by the generic Datum type , which is why the C string “my_index” is cast to Datum using the CStringGetDatum function . Postgres functions in principle communicate through Datum , so the result of our macro will be a value of type Datum. After that, you need to convert it to the Oid type using the DatumGetObjectId macro . All possible conversion options should be looked here: include / postgres.h.
I’ll also explain one more thing: in C, it is customary to declare variables at the beginning of a block, but for clarity, I declare them where I start using them. In practice, they don’t write like that.

Table access

I’ll explain right away why SPI is slow. The fact is that a query made using SPI goes through all the stages of parsing and planning. In addition, to go in a simple way, where there is no magic, it seems to me uninteresting.
The next thing I want to say about are the names - they are confused in Postgres! Due to the long history of the project, the code has left a lot of strange names for types, methods, and functions.
Before reading further, it is advisable to have a basic understanding of MVCC Postgres. All the examples below work only within the framework of an already created transaction, and if you suddenly crawled to where it is not already there, you will need much more magic.

So, suppose we just want to go over a table that contains two fields: int id and text nickname, and print them to the log. First, we need to open the heap (table) with a certain lock:

RangeVar *table_rv = makeRangeVar("public", "my_table", -1);
Relation table_heap = heap_openrv(table_rv, AccessShareLock);

Instead of the heap_openrv function, you can use heap_open , whose first argument is the Oid of the table (you can get it using the function in the first part of the article). I think the purpose of RangeVar is intuitive, but we ’ll dwell on the locks in more detail. The types of locks are declared in the include / storage / lockdefs.h file with fairly clear comments. You can see this information in the table:

AccessShareLock	SELECT
Rowsharelock	SELECT FOR UPDATE / FOR SHARE
RowExclusiveLock	INSERT, UPDATE, DELETE
ShareUpdateExclusiveLock	VACUUM (non-FULL), ANALYZE, CREATE INDEX CONCURRENTLY
Sharelock	CREATE INDEX (WITHOUT CONCURRENTLY)
ShareRowExclusiveLock	like EXCLUSIVE MODE, but allows ROW SHARE
ExclusiveLock	blocks ROW SHARE / SELECT ... FOR UPDATE
AccessExclusiveLock	ALTER TABLE, DROP TABLE, VACUUM FULL, and unqualified LOCK TABLE

Since we only wanted to go over the plate, that is, execute, SeqScan , select AccessShareLock . After we opened the heap, we need to initialize the process of scanning the table:

HeapScanDesc heapScan = heap_beginscan(sr_plans_heap, SnapshotSelf, 0, (ScanKey) NULL);

As expected, we pass our heap as the first argument to this function, but SnapshotSelf needs clarification. The work of MVCC in Postgres assumes that at any given time there can be several versions of one row of the table, and it is in the snapshot (transaction snapshot) that we can see and which can not. Except SnapshotSelf , i.e. the current snapshot transaction, for example, there is SnapshotAny , substituting which, we could also see all the deleted and changed tuples (table rows). Other views can be found in include / utils / tqual.h. The following arguments for heap_beginscan are the number of search keys (ScanKey) and the keys themselves. Search Keys (ScanKey) are essentially conditions, i.e. what you write in WHERE. To work with heap, search keys are not really needed, because you can always do a condition check yourself in your code. But when searching by index, we will not leave anywhere from their initialization and use.

And now the most important thing is to appear in the cycle:

Datum	values[2];
bool nulls[2];
for (;;)
{
    HeapTuple local_tuple;
    local_tuple = heap_getnext(heapScan, ForwardScanDirection);
    if (local_tuple == NULL)
        break;
    heap_deform_tuple(local_tuple, table_heap->rd_att, values, nulls);
    elog(WARNING,
           "Test id:%i nick:%s",
           DatumGetInt32(values[0]),
           TextDatumGetCString(values[1])
    );
}

In this loop, we call the heap_getnext function , with which we get the next tuple until we return a null pointer. The heap_getnext function receives our HeapScanDesc and scan directions, two will be relevant for us: this is a direct scan - ForwardScanDirection and the opposite - BackwardScanDirection . Now we just have to unpack the tuple and get access to its fields, for this we call heap_deform_tuple , where we pass our tuple, after which its description (which we take from heap) and two arrays (one for values and the other for determining NULLvalues). Further, using the functions already familiar to us, we will transform the elements of the values array (consisting of Datum ) to ordinary C types.

And now do not forget to close our heap (table) scan and close heap itself:

heap_endscan(heapScan);
heap_close(sr_plans_heap, AccessShareLock);

We close the heap with the same type of lock with which we opened it.

Index work

The index search API will be similar to the heap search, but will only require more lines of code to initialize. In the code we will try to display messages only for lines where the first argument gives an answer to the main question of life, the universe and all that. As for heap, first we give a piece of code with all the preparatory work:

RangeVar *table_rv = makeRangeVar("public", "my_table", -1);
Relation table_heap = heap_openrv(table_rv, AccessShareLock);
table_idx_oid = DatumGetObjectId(DirectFunctionCall1(
    to_regclass,
    StringGetDatum("my_table_idx")
));
Relation table_idx_rel = index_open(table_idx_oid, AccessShareLock);
indexScan = index_beginscan(table_heap, table_idx_rel, SnapshotSelf, 1, 0);
ScanKeyData key;
ScanKeyInit(&key, 1, BTEqualStrategyNumber, F_INT4EQ, Int32GetDatum(42));
index_rescan(indexScan, &key, 1, NULL, 0);

So, to search by index, in the same way as in the previous example, we need to open the heap (table) with which this index is associated. Using to_regclass we find the oid for our index my_table_idx, then open it with index_open , finding the Relation we need . After that, we initialize the process of scanning the index_beginscan index , here the main difference from heap_beginscan is that we will have 1 search key (ScanKey).
ScanKeyInit , as the name implies, initializes the key to search. The first argument is the key itself (type ScanKeyData), after we indicate the serial number of the argument by which the search will be carried out (numbering from 1), then the search strategy follows. In fact, it looks like an operator in a condition (the rest of the strategies can be found here include / access / startnum.h), after which we directly indicate the oid of the function that will perform our comparison operation (these oid are declared in the include / utils / fmgroids.h file) . And finally, our last argument is Datum , which should contain the value by which the search should be performed.

Next comes another new index_rescan function , and it serves to start a search by index. Single index_beginscanit’s not enough here. At the input, this function receives a list of keys (we have only one), the number of these keys, followed by the keys for sorting and the number of keys for sorting (they are used for the ORDER BY clause , which is not in this example). It seems that all the preparations have passed, and you can show the main loop, however, it will be very similar to what happened with heap:

for (;;)
{
    HeapTuple local_tuple;
    ItemPointer tid = index_getnext_tid(indexScan, ForwardScanDirection);
    if (tid == NULL)
        break;
    local_tuple = index_fetch_heap(indexScan);
    heap_deform_tuple(local_tuple, table_heap->rd_att, values, nulls);
    elog(WARNING,
           "Test id:%i nick:%s",
           DatumGetInt32(values[0]),
           DatumGetCString(PG_DETOAST_DATUM(values[1]))
    );
}

Since now we are running along the index, rather than the heap itself, we get ItemPointer, a special pointer to the entry in the index, (if you are interested in the details, refer to the relevant documentation www.postgresql.org/docs/9.5/static/storage-page -layout.html or directly to the file include / storage / bufpage.h) using which we still need to get tuple from heap. In this loop, index_getnext_tid is functionally similar to heap_getnext and only index_fetch_heap is added, and the rest is completely similar.

To complete our operation, as you might guess, we will need to close the search by index, the index itself and the open heap:

index_endscan(indexScan);
index_close(table_idx_rel, heap_lock);
heap_close(table_heap, heap_lock);

Data change

So, we learned how to make SeqScan and IndexScan, i.e. search on our plate and even use an index for this, but now how to add something to it? To do this, we will need the simple_heap_insert and index_insert functions .

Before changing something in the table and in the associated indexes, they must be opened with the necessary locks, using the method shown earlier, after which you can insert:

values[0] = Int32GetDatum(42);
values[1] = CStringGetDatum("First q");
tuple = heap_form_tuple(table_heap->rd_att, values, nulls);
simple_heap_insert(table_heap, tuple);
index_insert(
    table_idx_rel,
    values,
    nulls,
    &(tuple->t_self),
    table_heap,
    UNIQUE_CHECK_NO
);

Here we do the reverse operation, i.e. from the arrays of values and nulls we form a tuple,
after which we add it to the heap and then add the corresponding record to the index. After the previous explanations, this code should be clear to you.
To update tuple you need to do the following:

values[0] = Int32GetDatum(42);
replaces[0] = true;
newtuple = heap_modify_tuple(
    local_tuple,
    RelationGetDescr(table_heap),
    values,
    nulls,
    replaces
);
simple_heap_update(table_heap, &newtuple->t_self, newtuple);

We have an array of boolean variables replaces, where information is stored, which field has changed. After which we form a new tuple based on the old one, but with our edits using heap_modify_tuple . And at the very end, we perform the simple_heap_update update itself . Since we have a new tuple, and the old one is marked as deleted, we also need to add an entry to the index for the new tuple in the way that was shown earlier.

Now it will not be difficult to delete tuple with the simple_heap_delete function , and it requires no explanation. It is noteworthy that the corresponding entry in the index does not remove the need, it will happen automatically during the cleaning operation the VACUUM .

Total

We learned how to access the table from C code, including by index. I tried to describe each function and its purpose in as much detail as possible, but if something was not clear, ask in the comments, I will try to answer and supplement the article.

Tags: