Apache Ignite 2.5 Release - Memory-Centric Distributed Database and Caching Platform
In May, a new version of Apache Ignite - 2.5 was released. Many changes have been made to it, a complete list of which can be found in the Release Notes. And in this article we will look at key innovations that are worth paying attention to.
Apache Ignite is a horizontally scalable platform for transactional data storage as well as distributed computing on top of this data in near real-time mode.
Ignite is used in cases where horizontal scalability and very high processing speed are required. The latter is also achieved by optimizing the platform for storing data directly in RAM as the primary storage, and not as a cache (In-Memory Computing). Distinctive features of the product are a full-fledged query engine ANSI SQL 1999, disk storage, expanding RAM, a large number of built-in integration tools and Zero-ETL machine learning.
Among the companies that use Apache Ignite are firms such as Veon / Beeline , Sberbank, Huawei, Barclays, Citi, Microsoft and many others .
One of the major changes in version 2.5 is the new topology version. Previously, Ignite had only a ring topology, which was used to exchange events within the cluster and ensure efficient and fast scalability, on a scale of up to 300 nodes.
The new topology is designed for installations of many hundreds and thousands of nodes.
Now you can choose: use the old topology, where you only need Ignite, or - for large clusters - add large Ignite with small ZooKeeper , through which the exchange of events will take place.
Why is this and how it helps?
The large “ring” has a flaw: with regard to network exchange and processing, the notification of all nodes about a new event can reach seconds. This in itself is bad, and if we consider that the probability of node failure due to temporary network failure, equipment or other problems increases with the cluster size, the topology reorganization becomes quite a common task, especially on clusters built on cheap (commodity) hardware. In the large “ring”, this introduces additional chaos, interrupts the processing of the event flow and forces it to start over again, in parallel passing information about the new topology.
The new “star”, where the ZooKeeper cluster is located in the center, allows not only to preserve scalability (ZooKeeper can also grow horizontally), but even significantly increase it - scalability - efficiency on large volumes. After all, by sharing responsibility for handling events and working with data, we can allocate a much more compact ZooKeeper cluster for event handling, thereby reducing the dependence of the passage of events on the cluster size.
This does not affect the exchange of data between Ignite nodes, for example, when rebalancing, as well as storing data or retrieving them, because all these operations went and bypassed the event topology through direct connections between cluster nodes.
Also, adding a new topology does not affect the old one - it still remains recommended, since it is much easier to maintain and more lightweight, it does not require additional elements.
But if you have reached the limit of scaling of the “ring” of Ignite, you can not be engaged in micro-optimization and adjusting hacks, do not apply specific knowledge and “crutches”. In this case, you have an officially accessible and supported solution that will significantly facilitate the implementation and support of such a large cluster.
Detailed information on the new topology can be found in the documentation .
Version 2.4 introduced support for thin clients outside of JDBC / ODBC, which are not full participants of the topology, are much less demanding of resources, but at the same time offer reduced functionality. The first thin client was a client for .NET.
Starting with version 2.5, a thin client for Java is also available.
This allows you to embed Ignite into resource-sensitive applications, such as applications on end devices, without unnecessary layers. Previously, such a task required an intermediate layer, which, for example, according to REST, accepted requests and then used an internal “fat” client to exchange data with the Ignite cluster.
A thin client can be used by connecting the standard “zero-dependency” JAR file to ignite-core, for example, from the maven repository.
Example of using thin client:
Also in version 2.4 there was no authentication for thin clients. Starting with version 2.5, it, along with support for encryption when accessing thin clients, will debut in Ignite.
Detailed information can be found in the documentation .
The distributed ANSI99 SQL engine is historically one of the strengths and important distinguishing features of Apache Ignite. It allows a “single window”, through a Java / .NET / C ++ client or through a JDBC / ODBC driver, to execute SQL queries on top of the entire cluster, both data in RAM and Ignite Native Persistence.
In versions 2.0–2.4, the developers focused not only on improving performance (which is never enough), but also on accelerating the initial and bulk loading of data and more complete support for DDL, such operations as CREATE TABLE, ALTER TABLE, CREATE INDEX, etc., which also through the “single window” could be consistently executed on the cluster.
In version 2.5, a step towards the further development of DDL was made: authentication for SQL and the corresponding CREATE USER commands were added, ALTER USER , DROP USER .
If earlier it was intended to place Ignite SQL in a sterile DMZ with access control at the level of overlying services, then now you can increase security using authentication controlled via SQL.
From the point of view of the speed of loading large amounts of data in 2.5, 2 innovations have appeared.
The first is the new SQL command COPY in JDBC, which allows you to quickly, in bulk mode, transfer the contents of a CSV file to a table.
The second is streaming mode for JDBC, enabled via the new SET command . It allows, when loading data through standard INSERT operations, transparently for the user to group and optimize loading of new records into the Ignite cluster. The transfer of accumulated operations to the cluster occurs when the batch is full or when the JDBC connection is closed.
Machine learning is one of the strategic directions for the development of Ignite. ML implements the Zero-ETL paradigm, which allows you to train directly on the data that are in the Ignite transactional repository. Thereby, significant time and resources are saved on data conversion and network transmission to the training system.
In version 2.5, genetic algorithms were added to the set of available tools .
Also, to optimize the learning process and minimize network exchange, there appeared support for ML computations that have information about the partitions on which they are executed and which can bind additional information to these partitions. Considering that partitions are atomic in terms of distribution, and one partition cannot be cut between several nodes, this makes it possible to optimize the learning process, providing more guarantees for the algorithm.
Detailed information can be found in the documentation .
Also in version 2.5 implemented:
Apache Ignite is a horizontally scalable platform for transactional data storage as well as distributed computing on top of this data in near real-time mode.
Ignite is used in cases where horizontal scalability and very high processing speed are required. The latter is also achieved by optimizing the platform for storing data directly in RAM as the primary storage, and not as a cache (In-Memory Computing). Distinctive features of the product are a full-fledged query engine ANSI SQL 1999, disk storage, expanding RAM, a large number of built-in integration tools and Zero-ETL machine learning.
Among the companies that use Apache Ignite are firms such as Veon / Beeline , Sberbank, Huawei, Barclays, Citi, Microsoft and many others .
New topology option: star around ZooKeeper
One of the major changes in version 2.5 is the new topology version. Previously, Ignite had only a ring topology, which was used to exchange events within the cluster and ensure efficient and fast scalability, on a scale of up to 300 nodes.
The new topology is designed for installations of many hundreds and thousands of nodes.
Now you can choose: use the old topology, where you only need Ignite, or - for large clusters - add large Ignite with small ZooKeeper , through which the exchange of events will take place.
Why is this and how it helps?
The large “ring” has a flaw: with regard to network exchange and processing, the notification of all nodes about a new event can reach seconds. This in itself is bad, and if we consider that the probability of node failure due to temporary network failure, equipment or other problems increases with the cluster size, the topology reorganization becomes quite a common task, especially on clusters built on cheap (commodity) hardware. In the large “ring”, this introduces additional chaos, interrupts the processing of the event flow and forces it to start over again, in parallel passing information about the new topology.
The new “star”, where the ZooKeeper cluster is located in the center, allows not only to preserve scalability (ZooKeeper can also grow horizontally), but even significantly increase it - scalability - efficiency on large volumes. After all, by sharing responsibility for handling events and working with data, we can allocate a much more compact ZooKeeper cluster for event handling, thereby reducing the dependence of the passage of events on the cluster size.
This does not affect the exchange of data between Ignite nodes, for example, when rebalancing, as well as storing data or retrieving them, because all these operations went and bypassed the event topology through direct connections between cluster nodes.
Also, adding a new topology does not affect the old one - it still remains recommended, since it is much easier to maintain and more lightweight, it does not require additional elements.
But if you have reached the limit of scaling of the “ring” of Ignite, you can not be engaged in micro-optimization and adjusting hacks, do not apply specific knowledge and “crutches”. In this case, you have an officially accessible and supported solution that will significantly facilitate the implementation and support of such a large cluster.
Detailed information on the new topology can be found in the documentation .
Thin clients: thin Java client, authentication in thin clients
Version 2.4 introduced support for thin clients outside of JDBC / ODBC, which are not full participants of the topology, are much less demanding of resources, but at the same time offer reduced functionality. The first thin client was a client for .NET.
Starting with version 2.5, a thin client for Java is also available.
This allows you to embed Ignite into resource-sensitive applications, such as applications on end devices, without unnecessary layers. Previously, such a task required an intermediate layer, which, for example, according to REST, accepted requests and then used an internal “fat” client to exchange data with the Ignite cluster.
A thin client can be used by connecting the standard “zero-dependency” JAR file to ignite-core, for example, from the maven repository.
Example of using thin client:
ClientConfiguration cfg = new ClientConfiguration().setAddresses("127.0.0.1:10800");
try (IgniteClient igniteClient = Ignition.startClient(cfg)) {
System.out.println(">>> Пример применения тонкого клиента.");
final String CACHE_NAME = "put-get-example";
ClientCache<Integer, Address> cache = igniteClient.getOrCreateCache(CACHE_NAME);
System.out.format(">>> Создан кеш [%s].\n", CACHE_NAME);
Integer key = 1;
Address val = new Address("Мясницкая, 20", 101000);
cache.put(key, val);
System.out.format(">>> Значение [%s] сохранено в кеш.\n", val);
Address cachedVal = cache.get(key);
System.out.format(">>> Значение [%s] загружено из кеша.\n", cachedVal);
} catch (...) { ... }
Also in version 2.4 there was no authentication for thin clients. Starting with version 2.5, it, along with support for encryption when accessing thin clients, will debut in Ignite.
ClientConfiguration clientCfg = new ClientConfiguration()
.setAddresses("127.0.0.1:10800");
// включение шифрования
clientCfg
.setSslMode(SslMode.REQUIRED)
.setSslClientCertificateKeyStorePath("client.jks")
.setSslClientCertificateKeyStoreType("JKS")
.setSslClientCertificateKeyStorePassword("123456")
.setSslTrustCertificateKeyStorePath("trust.jks")
.setSslTrustCertificateKeyStoreType("JKS")
.setSslTrustCertificateKeyStorePassword("123456")
.setSslKeyAlgorithm("SunX509")
.setSslTrustAll(false)
.setSslProtocol(SslProtocol.TLS);
// настройка аутентификации
clientCfg
.setUserName("joe")
.setUserPassword("passw0rd!");
try (IgniteClient client = Ignition.startClient(clientCfg)) {
...
}
catch (ClientAuthenticationException e) {
// Handle authentication failure
}
Detailed information can be found in the documentation .
SQL: authentication and user management, fast bulk loading
The distributed ANSI99 SQL engine is historically one of the strengths and important distinguishing features of Apache Ignite. It allows a “single window”, through a Java / .NET / C ++ client or through a JDBC / ODBC driver, to execute SQL queries on top of the entire cluster, both data in RAM and Ignite Native Persistence.
In versions 2.0–2.4, the developers focused not only on improving performance (which is never enough), but also on accelerating the initial and bulk loading of data and more complete support for DDL, such operations as CREATE TABLE, ALTER TABLE, CREATE INDEX, etc., which also through the “single window” could be consistently executed on the cluster.
In version 2.5, a step towards the further development of DDL was made: authentication for SQL and the corresponding CREATE USER commands were added, ALTER USER , DROP USER .
If earlier it was intended to place Ignite SQL in a sterile DMZ with access control at the level of overlying services, then now you can increase security using authentication controlled via SQL.
From the point of view of the speed of loading large amounts of data in 2.5, 2 innovations have appeared.
The first is the new SQL command COPY in JDBC, which allows you to quickly, in bulk mode, transfer the contents of a CSV file to a table.
COPY FROM "/path/to/local/file.csv" INTO city (
ID, Name, CountryCode, District, Population) FORMAT CSV
The second is streaming mode for JDBC, enabled via the new SET command . It allows, when loading data through standard INSERT operations, transparently for the user to group and optimize loading of new records into the Ignite cluster. The transfer of accumulated operations to the cluster occurs when the batch is full or when the JDBC connection is closed.
SET STREAMING ON
Machine Learning: Genetic Algorithms, Partition Binding
Machine learning is one of the strategic directions for the development of Ignite. ML implements the Zero-ETL paradigm, which allows you to train directly on the data that are in the Ignite transactional repository. Thereby, significant time and resources are saved on data conversion and network transmission to the training system.
In version 2.5, genetic algorithms were added to the set of available tools .
Also, to optimize the learning process and minimize network exchange, there appeared support for ML computations that have information about the partitions on which they are executed and which can bind additional information to these partitions. Considering that partitions are atomic in terms of distribution, and one partition cannot be cut between several nodes, this makes it possible to optimize the learning process, providing more guarantees for the algorithm.
Detailed information can be found in the documentation .
And much more
Also in version 2.5 implemented:
- support for executing SQL queries through Spark DataFrame directly in Ignite without raising intermediate data in Spark;
- UPDATE support via LINQ in .NET;
- custom processing of critical errors (for example, automatic restart of a node when memory is full);
- transaction labels and additional debugging tools for long transactions ;
- JMX bins for transaction monitoring ;
- the ability to interrupt long transactions if necessary to restructure the topology of partitions (which is blocked by the presence of active transactions);
- tools for offline validation of data and metadata consistency ;
- additional validation of the results of SQL queries ;
- ODBC failover ;
- new metrics : the number of partitions on the node for the cache group, the amount of RAM for the region of data on the node, the total WAL size on the node, low-level metrics of disk storage pages;
- DEB-packages in addition to the previously appeared RPM.