Caché Native Access - working with native libraries in Caché

    image
    A picture to attract attention

    As you know, Caché is not only a DBMS, but also a full-fledged programming language (Caché ObjectScript). Both from the DBMS and from the Caché ObjectScript (COS) side, access beyond Caché is rich in capabilities (in .Net / Java via .Net / Java Gateway, to relational DBMSs through SQL Gateway, working with web services). But if we talk about working with native binary libraries, then this interaction is implemented through the Caché Callout Gateway , which is somewhat specific. You can find out how to radically facilitate working with native libraries directly from COS.


    Caché callout gateway


    Today, Caché uses the Caché Callout Gateway to work with native code . This name refers to several functions combined under one name - $ ZF (). These functions are divided into two groups:
    • $ ZF (-1), $ ZF (-2). The first group of functions allows you to work with system commands and console programs. This is an effective tool, but its drawback is obvious - all the functionality of the library is difficult to implement in one or more programs.
      An example of using $ ZF (-1)
      Create a new folder in the working directory with the name "newfolder":
      set name = "newfolder"
      set status = $ZF(-1, "mkdir " _ name)
      

    • $ ZF (-3), $ ZF (-5), $ ZF (). The second group of functions provides access to dynamic and static libraries. This is more like what we need. But not everything is so simple: $ ZF () does not work with any libraries, but only with libraries of a special kind - Callout Libraries . Callout Library differs from a regular library by the presence in the code of a special character table ZFEntry, which contains a certain analog of prototypes of exported functions. Moreover, the type of arguments of exported functions is strictly limited - only int and several kinds of pointers are supported. That is, to make a Callout Library from an arbitrary library, you will most likely have to write a wrapper over the entire library, which is not convenient.
      An example of creating a Callout Library and calling a function from it
      Callout Library, file test.c:
      #define ZF_DLL
      #include  // Файл cdzf.h находится в папке Cache/dev/cpp/include 
      int
      square(int input, int *output)
      {
        *output = input * input;
        return ZF_SUCCESS; 
      }
      ZFBEGIN // Таблица символов
      ZFENTRY("square", "iP", square) // "iP" означает, что в square два аргумента - int и int *
      ZFEND
      

      Compilation (mingw):
      gcc -mdll -fpic test.c -o test.dll
      
      On Linux, you need to replace -mdll with -shared.

      Call square () from Caché:
      USER> do $ZF(-3, "test.dll", "square", 9)
      81
      



    Caché native access


    To remove the limitations of Callout Gateway and make working with native libraries convenient, the CNA project was created . The name is tracing paper from a similar project for the JNA Java machine .

    CNA Features:

    • You can call functions from any dynamic (shared) library that is binary compatible with C
    • To call functions, you only need COS code - you don’t need to write anything in C or another machine-compiled language
    • Support for all simple C, size_t and pointer types
    • Support for structures (and nested structures)
    • Caché Thread Support
    • Supported Platforms: Linux (x86-32 / 64), Windows (x86-32 / 64)


    Installation


    First we collect the C part, compiled with one command -
    make libffi && make
    Under Windows, you can use mingw to compile, or download ready-made binary files . Then we import the cna.xml file into any convenient area:
    do $system.OBJ.Load("путь к cna.xml", "c")


    CNA Example


    The simplest native library that is available on all systems is the standard C library. On Windows, it is usually located at C: \ Windows \ System32 \ msvcrt.dll, on Linux it is /usr/lib/libc.so. Let's try to call some function from it, for example strlen, it has such a prototype:
    size_t strlen(const char *);


    Class CNA.Strlen Extends %RegisteredObject
    {
      ClassMethod Call(libcnaPath As %String, libcPath As %String, string As %String) As %Integer
      {
        set cna = ##class(CNA.CNA).%New(libcnaPath)      // Создает объект типа CNA.CNA
        do cna.LoadLibrary(libcPath)                     // Загружаем libc в CNA
        set pString = cna.ConvertStringToPointer(string) // Конвертируем строку в формат C и сохраняем указатель на начало
        // Вызываем strlen: передаем название функции, тип возвращаемого значения, 
        // список типов аргументов и все аргументы через запятую
        set result = cna.CallFunction("strlen", cna.#SIZET, $lb(cna.#POINTER), pString)
        do cna.FreeLibrary()
        return result
      }
    }
    

    In terminal:
    USER>w ##class(CNA.Strlen).Call("libcna.dll", "C:\Windows\system32\msvcrt.dll", "hello")
    5
    


    Implementation Details


    CNA is a bunch of C library and Caché class. Most CNA relies on libffi. libffi is a library that allows you to organize a "low level" external function interface (FFI). It helps to forget about the existence of various calling conventions and call functions at run time, without providing their specifications at compile time. But to call functions from libffi, you need the address of the function, and we would like to call functions only by name. To get the address of a function from any name you will have to use platform-specific interfaces: POSIX and WinAPI. In POSIX, there is a dlopen () / dlsym () mechanism for loading a library and finding the address of a function; in WinAPI, the LoadLibrary () and GetProcAddress () functions. This is one of the obstacles to porting CNA to other platforms, although on the other hand, almost all modern systems, at least partially, support the POSIX standard (except, of course, Windows).

    libffi is written in C and assembler. Therefore, libffi is a native library, and access to it from Caché can only be obtained using Callout Gateway. That is, you need to write a layer that connects libffi and Caché and is a Callout Library so that it can be accessed from COS. Sample CNA workflow:


    At this point, a data conversion problem arises. When we call a function from COS, we pass arguments in the internal Caché format. You need to transfer them to the Callout Gateway, then to libffi, but you still need to convert them to the C format somewhere. But the Callout Gateway supports very few data types and if we converted the data on the C side, we would have to transfer everything to as strings, and then parse them, which is not convenient for many reasons. Therefore, it was decided to convert the data on the Cache side and pass all the arguments as strings with binary data in the C format.

    Since all C data types, except composite ones, are numbers, the actual task of converting data is to convert numbers to binary strings using COS. For these purposes, Caché has some wonderful features that allow you to circumvent the need for direct access to data: $ CHAR and $ ASCII, which convert an 8-bit number to a character and vice versa. There are analogs for all the necessary numbers - for 16, 32 and 64-bit integers and double-precision floating-point numbers. But there is one thing - all these functions work only for signed or unsigned numbers (of course, when working with integers). In C, as you know, a number of any size can be either signed or unsigned. To complete these functions before full work will have to be done manually.

    An extra code is used to represent signed numbers in C:
    • The first bit is responsible for the sign of the number: 0 - plus, 1 - minus
    • Positive numbers are encoded in the same way as unsigned
    • The maximum positive number is 2 k-1 -1, k is the number of bits
    • The code for the negative number x matches the code for the unsigned number 2 k + x

    This method allows you to use the same implementation of addition as for unsigned numbers. This is achieved through arithmetic overflow .

    Let's consider an example of conversion for unsigned 32-bit numbers. If the number is positive, then simply use the $ ZLCHAR function, if it is negative, then you need to find such an unsigned number so that they match in binary form. How to look for this number directly follows from the definition of the additional code - you need to add the original number to the minimum, which does not fit in 32 bits - 2 32 or FFFFFFFF 16 + 1. As a result, this code is obtained:

    if (x < 0) {
        set x = $ZLCHAR($ZHEX("FFFFFFFF") + x + 1)
    } else
        set x = $ZLCHAR(x)
    }
    

    The next problem is the transformation of structures, a composite type of the C language. Everything would be simple if the structures in memory were represented in the same way as they were written - all fields follow in a row, one after another. But in memory, the structure is located so that the address of each of the fields is a multiple of a special number, field alignment. The end of the structure is also aligned - by the largest field alignment. Alignment is necessary due to the fact that most platforms either do not know how to work with unaligned data, or do it rather slowly. Usually, on x86, alignment is equal to the size of the field, but there is an exception - 32-bit Linux, where the alignment of all fields that are larger than 4 bytes is exactly 4 bytes. Read more about data alignment in this article .

    Take, as an example, such a structure:
    struct X {
        char a, b; // sizeof(char) == 1
        double c;  // sizeof(double) == 8
        char d;    
    };
    

    On x86-32, it will be located in different ways in different OSs:

    In practice, such a representation of the structure is formed quite simply. It is necessary to write the fields in memory sequentially, but each time to form an indent (padding) - an empty space before writing. The indent is calculated as follows:
    set padding = (alignment - (offset # alignment)) # alignment //offset - адрес конца последней записи
    


    What is not working yet


    1) Integers in Caché are represented in such a way that accurate work with them is guaranteed only until the number goes beyond the 64-bit signed number. But in C there is a 64-bit unsigned type (unsigned long long). That is , it will not be possible to pass a number that exceeds the maximum 64-bit signed, 2 63 -1 (~ 9 * 10 18 ) to an external function .

    2) There are two types for working with real numbers in Caché: a decimal and double-precision floating-point numbers of the IEEE 754 standard . That is, there are no analogues of the C type float and long double language types in Caché. You can work with CNA with these types, but each time you hit Caché they will be converted to double.

    3) When working on Windows with a long double, most likely everything will work incorrectly. This is because Microsoft and the mingw development team have fundamentally different views on what a long double should be. Microsoft believes that on both 32 and 64-bit systems, the size of a long double is 8 bytes. In mingw, on 32 bits - 12 bytes, on 64 - 16. And since CNA is compiled using mingw, it is better to forget about long double.

    4) There is no support for unions and bit fields in structures (bitfields). This is because libffi does not support them.

    Criticism, comments, suggestions are welcome.

    All source code is uploaded to the github under the MIT license.
    github.com/intersystems-ru/cna

    Also popular now: