Distributed C ++ applications with minimal effort

The purpose of my post is to talk about the C ++ API Apache Ignite distributed database, which is called Ignite C ++, as well as its features.

About the Apache Ignite on Habré wrote more than once, so surely some of you already have some idea what it is and why you need it.

Briefly about Apache Ignite for those who are not familiar with it.

I will not go into details about how Apache Ignite appeared and how it differs from classic databases. All these questions have already been raised here , here or here .

So, Apache Ignite is essentially a fast distributed database optimized for working with RAM. Ignite itself grew out of the date-grid (In-memory Data Grid) and, until recently, was positioned as a very fast, distributed cache that is completely in memory on the basis of a distributed hash table. That is why, in addition to storing data, it has many convenient features for their fast distributed processing: Map-Reduce, atomic data operations, full ACID transactions, SQL queries on data, so-called Continues Queries, which make it possible to monitor changes in certain data and others.

However, recently the platform has added support for persistent data storage on disk . After that, Apache Ignite got all the advantages of a complete object-oriented database, while maintaining the convenience, wealth of tools, flexibility and speed of the grid.

Some theory

An important detail for understanding how to work with Apache Ignite is that it is written in Java. You ask: “What difference does it make to me, what is the database written on, if I communicate with it in any case using SQL?”. There is some truth in this. If you want to use Ignite only as a database, you can easily take the ODBC or JDBC driver that comes with Ignite, raise the number of server nodes you need using a specially created script ignite.sh, configure them with flexible config files and do not particularly bother about language, working with Ignite even from PHP, at least from Go.

The native Ignite interface provides much more than just SQL. From the simplest: fast atomic operations with objects in the database, distributed synchronization objects and distributed computing in a cluster on local data, when you do not need to drag hundreds of megabytes of data to a client for calculations. As you understand, this part of the API does not work through SQL, but is written in quite specific general purpose programming languages.

Naturally, since Ignite is written in Java, the most complete API is implemented in this programming language. However, besides Java, there are also API versions for C # .NET and C ++. These are the so-called "fat" clients - in fact, the Ignite node in the JVM, launched from C ++ or C #, which is communicated through JNI. This type of node is necessary, among other things, in order for the cluster to be able to run distributed computing in the appropriate languages - C ++ and C #.

In addition, there is an open protocol for the so-called "thin" clients. These are already lightweight libraries in various programming languages that communicate with the cluster via TCP / IP. They take up much less space in memory, start almost instantly, do not require a JVM on the machine, but they have somewhat worse latency and not so rich API compared to fat clients. Today, there are thin clients in Java, C #, and Node.js, clients are being actively developed in C ++, PHP, Python3, Go.

In the post, I will look at the Ignite Fat Interface API for C ++, since it is he who currently provides the most complete API.

Beginning of work

I will not dwell on the process of installing and configuring the framework itself - the process is routine, not very interesting and well described, for example, in official documentation . Let's go straight to the code.

Since Apache Ignite is a distributed platform, first of all you need to start at least one node to get started. This is done very simply using the class ignite::Ignition:

#include<iostream>#include<ignite/ignition.h>usingnamespace ignite;
intmain(){
    IgniteConfiguration cfg;
    Ignite node = Ignition::Start(cfg);
    std::cout << "Node started. Press 'Enter' to stop" << std::endl;
    std::cin.get();
    Ignition::StopAll(false);
    std::cout << "Node stopped" << std::endl;
    return0;
}

Congratulations, you launched your first Apache Ignite node in C ++ with default settings. The Ignite class, in turn, is the main entry point for accessing the entire cluster API.

Work with data

The main component of Ignite C ++, which provides an API for working with data - a cache ignite::cache::Cache<K,V>. The cache provides a basic set of methods for working with data. Since Cachein essence it is an interface to a distributed hash table, the basic methods of working with it resemble work with ordinary containers of the type mapor unordered_map.

#include<string>#include<cassert>#include<cstdint>#include<ignite/ignition.h>usingnamespace ignite;
structPerson
{int32_t age;
    std::string firstName;
    std::string lastName;
}
//...intmain(){
    IgniteConfiguration cfg;
    Ignite node = Ignition::Start(cfg);
    cache::Cache<int32_t, Person> personCache = 
        node.CreateCache<int32_t, Person>("PersonCache");
    Person p1 = { 35, "John", "Smith" };
    personCache.Put(42, p1);
    Person p2 = personCache.Get(42);
    std::cout << p2 << std::endl;
    assert(p1 == p2);
    return0;
}

Looks pretty simple, right? In fact, everything is somewhat complicated if we take a closer look at the limitations of C ++.

C ++ integration challenges

As I mentioned, Apache Ignite is written entirely in Java, a powerful OOP-driven language. It is natural that many of the features of this language, associated, for example, with the reflection of the program execution time, were actively used to implement Apache Ignite components. For example, for serialization / deserialization of objects for storage on disk and transfer over the network.

In C ++, unlike Java, there is no such powerful reflection. In general, there is no yet, unfortunately. In particular, there are no ways to find out the list and type of object fields, which would allow automatically generating the code necessary for serializing / deserializing objects of custom types. Therefore, the only option here is to ask the user to explicitly provide the necessary set of metadata about the user type and how to work with it.

In Ignite C ++, this is implemented through template specialization ignite::binary::BinaryType<T>. This approach is used in both “thick” and “thin” clients. For the Person class presented above, a similar specialization might look like this:

namespace ignite
{
namespace binary
{
template<>
structBinaryType<Person>
{static int32_t GetTypeId(){
        return GetBinaryStringHashCode("Person");
    }
    staticvoidGetTypeName(std::string& name){
        name = "Person";
    }
    static int32_t GetFieldId(constchar* name){
        return GetBinaryStringHashCode(name);
    }
    staticboolIsNull(const Person& obj){
        returnfalse;
    }
    staticvoidGetNull(Person& dst){
        dst = Person();
    }
    staticvoidWrite(BinaryWriter& writer, const Person& obj){
        writer.WriteInt32("age", obj.age;
        writer.WriteString("firstName", obj.firstName);
        writer.WriteString("lastName", obj.lastName);
    }
    staticvoid Read(BinaryReader& reader, Person& dst)
    {
        dst.age = reader.ReadInt32("age");
        dst.firstName = reader.ReadString("firstName");
        dst.lastName = reader.ReadString("lastName");
    }
};
} // namespace binary
} // namespace ignite

As can be seen, in addition to methods for serializing / deserializing BinaryType<Person>::Write, BinaryType<Person>::Readhere there is a few other methods. They are needed in order to explain to the platform how to work with custom C ++ types in other languages, in particular, Java. Let's take a closer look at each method:

GetTypeName()- Returns the type name. The type name must be the same on all platforms on which this type is used. If you use the type only in Ignite C ++, the name can be anything.
GetTypeId()- This method returns a cross-platform unique identifier for the type. To work correctly with a type on different platforms, it is necessary that it is calculated the same everywhere. The method GetBinaryStringHashCode(TypeName)returns the same Type ID as on all other platforms by default, that is, such an implementation of this method allows you to work correctly with this type from other platforms.
GetFieldId()- Returns a unique identifier for the type name. Again, for correct cross-platform work, use the method GetBinaryStringHashCode();
IsNull()- Checks whether an instance of a class is an object of type NULL. Used to correctly serialize NULL-values. Not very useful with instances of the class itself, but it can be extremely convenient if the user wants to work with smart pointers and define specialization, for example, for BinaryType< std::unique_ptr<Person> >.
GetNull()- Called when trying to deserialize a NULLvalue. Everything said about IsNullis true for GetNull().

SQL

If we draw an analogy with classical databases, the cache is a database schema with the name of a class containing one table — with the name of a type. In addition to schema-caches, there is a common schema with the name PUBLIC, in which you can create / delete an unlimited number of tables using standard DDL commands, such as CREATE TABLE, DROP TABLEand so on. It is precisely to the PUBLIC scheme that they are usually connected via ODBC / JDBC if they want to use Ignite simply as a distributed database.

Ignite supports full-fledged SQL queries, including DML and DDL. There is no support for SQL transactions yet, but the community is now actively working on the implementation of the MVCC, which will allow adding transactions, and as far as I know, major changes have recently been infused into master.

To work with cache data through SQL, you must explicitly specify in the cache configuration which fields of the object will be used in SQL queries. The configuration is written in the XML file, after which the path to the configuration file is specified when the node is started:

<beansxmlns="http://www.springframework.org/schema/beans"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:util="http://www.springframework.org/schema/util"xsi:schemaLocation="
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/util
    http://www.springframework.org/schema/util/spring-util.xsd"><beanid="grid.cfg"class="org.apache.ignite.configuration.IgniteConfiguration"><propertyname="cacheConfiguration"><list><beanclass="org.apache.ignite.configuration.CacheConfiguration"><propertyname="name"value="PersonCache"/><propertyname="queryEntities"><list><beanclass="org.apache.ignite.cache.QueryEntity"><propertyname="keyType"value="java.lang.Integer"/><propertyname="valueType"value="Person"/><propertyname="fields"><map><entrykey="age"value="java.lang.Integer"/><entrykey="firstName"value="java.lang.String"/><entrykey="lastName"value="java.lang.String"/></map></property></bean></list></property></bean></list></property></bean></beans>

The config is parsed by the Java engine, so basic types must also be specified for Java. After the configuration file is created, you need to start the node, get an instance of the cache, and you can start using SQL:

//...intmain(){
    IgniteConfiguration cfg;
    cfg.springCfgPath = "config.xml";
    Ignite node = Ignition::Start(cfg);
    cache::Cache<int32_t, Person> personCache =
        node.GetCache<int32_t, Person>("PersonCache");
    personCache.Put(1, Person(35, "John", "Smith"));
    personCache.Put(2, Person(31, "Jane", "Doe"));
    personCache.Put(3, Person(12, "Harry", "Potter"));
    personCache.Put(4, Person(12, "Ronald", "Weasley"));
    cache::query::SqlFieldsQuery qry(
        "select firstName, lastName from Person where age = ?");
    qry.AddArgument<int32_t>(12);
    cache::query::QueryFieldsCursor cursor = cache.Query(qry);
    while (cursor.HasNext())
    {
        QueryFieldsRow row = cursor.GetNext();
        std::cout << row.GetNext<std::string>() << ", ";
        std::cout << row.GetNext<std::string>() << std::endl;
    }
    return0;
}

Likewise, you can use the insert, update, create tableand other requests. Of course, cross-cache requests are also supported. However, in this case, the cache name should be specified in the query in quotation marks as the name of the schema. For example, instead of

select * from Person innerjoin Profession

should write

select * from"PersonCache".Person innerjoin"ProfessionCache".Profession

And so on

There are really a lot of opportunities in Apache Ignite and, of course, in one post it was impossible to cover them all. C ++ API is actively developing now, so soon there will be even more interesting things. It is possible that I will write a few more posts, where I will analyze some features in more detail.

PS I have been an Apache Ignite committer since 2017 and have been actively developing the C ++ API for this product. If you reasonably know C ++, Java or .NET and would like to participate in the development of an open product with an active friendly community, we always have a couple of interesting tasks for you.

Tags: