artemshitov May 29, 2017 at 12:21

Apache Ignite 2.0 - Machine Learning, New Storage Model, DDL

In May, a new major version of Apache Ignite was released - a distributed platform optimized for working with RAM, which combines key-value storage with a SQL99-compatible database, offering full ACID compatibility, high availability, and close to linear scaling from several nodes to thousands that can be hosted on their own equipment or in the cloud. The Apache Ignite kernel is written in Java, but the platform, in addition to the Java ecosystem, supports native integration with .NET and C ++ applications .

Apache Ignite scales elastically within one or more geo-distributed clusters, providing flexible sharding and automatic rebalancing when dynamically adding or removing nodes, providing transparent and fast access to data and calculations by using its own API or classical SQL.

In version 2.0, many things were significantly reworked “under the hood”, the consequence was the possibility of implementing a number of significant functional changes, some of which are already noticeable, and some will appear in the next versions.

Looking ahead, we will conduct 2 events that are related to Apache Ignite, more about them can be found at the end of the article.

New storage architecture

Apache Ignite by default works with random access memory, stores data in it in a distributed form, and performs calculations there. One of the key innovations of version 2.0 is the completely redesigned memory architecture called Page Memory. And this is very important.

The new approach to data storage is much more complicated and thoughtful than the old one, it allows you to avoid problems with memory fragmentation, significantly speed up work with SQL and minimize the impact of GC pauses on the functioning of the system. Moreover, the new architecture allows seamless operation with both RAM and disk. In version 2.0, this feature is not yet available, but it will soon be possible to learn more about our development plans in this area.

The new architecture can be found in general terms in the figure below, as well as in a special section of the documentation .

Machine learning

The goal of Apache Ignite is to build a platform that includes many closely integrated modules, not only a distributed Data Grid storage, using which developers can solve tasks of varying degrees of complexity, from very light (I want a fast distributed cache) to very heavy (I I want distributed real-time HTAP calculations on big data that are stored in data centers in different corners of the Earth, and I would like to integrate with Cassandra, Spark, Hadoop, etc.).

Unfortunately, a bunch of components for one of the “hottest” areas of modern IT — machine learning — was missing from Apache Ignite. Up to this point.

Apache Ignite 2.0 adds support for basic machine learning algebraadapted for distributed computing. We understand that while we offer very low-level tools, we are not going to stop there. In future versions, this basic algebra will become the basis on which we will build distributed implementations of the basic machine learning algorithms: regressions, classification trees, etc.

In the meantime, you can read the examples on GitHub and try to feel the current product with your hands.

Data definition language

Since this release, Apache Ignite has added initial DDL support to DML. Now you can create and, importantly, change indexes without interrupting the operation of cluster nodes, using the classic SQL syntax. This is one of the most anticipated features that our users have been asking for. And this is just the beginning! In future releases, more and more DDL operations will appear, including CREATE TABLE, ALTER TABLE, etc. Read more about current features in the documentation .

Also among the changes

Ignite.NET: plugin system support for Ignite.NET ;
Ignite.C ++: remote call of C ++ - code on a cluster, in this version so far only in continuous queries;
integration with Spring Data will facilitate the implementation of Apache Ignite, making it easy to use with a common framework for building applications;
integration with RocketMQ ;
support for Hibernate 5 L2 cache ;
and much more

Webinar and Meetup

In honor of the release of Apache Ignite 2.0, we plan to hold 2 events:

- a webinar on June 7, which will talk about the innovations of version 2.0 in English
- Ignition.meetup (), which will be held in Moscow in the near future (to be announced separately), it can be will exchange experiences in Russian, ask questions and listen to real cases of building solutions on the platform

Tags: