
Apache Ignite 2.4 Release - Distributed Database and Caching Platform
On March 12, 2018, 4 months after the previous version, Apache Ignite 2.4 was released. This release is notable for a number of innovations: support for Java 9, multiple SQL optimizations and improvements, support for the neural network platform, a new approach to building a topology for working with disk, and much more.
Apache Ignite Database and Caching Platform is a platform for distributed data storage (optimized for active use of RAM), as well as for distributed computing in near real time.
Ignite is used where you need to very quickly process large streams of data that are too tough for centralized systems.
Examples of use: fast distributed cache; a layer aggregating data from disparate services (for example, for Customer 360 View); main horizontally scalable storage (NoSQL or SQL) of operational data; platform for computing, etc.
Next, consider the main innovations of Ignite 2.4.
If you used Apache Ignite along with its own disk storage, you probably had to deal with:
Baseline Topology solves these problems by fixing a set of nodes that contain disk data and affect cluster activation, topology behavior and rebalancing.
Baseline Topology is such an important change in Ignite that in the near future we will publish a separate article on this feature.
Now you can create thin clients based on your own binary protocol .
Earlier, clients for .NET and C ++ raised inside themselves a full-fledged JVM with Ignite for communication with the cluster. This provided easy and cheap access to the platform’s extensive functionality, but the customers were heavy.
New thin clients are independent and do not need to use JVM. This significantly reduces resource consumption and improves productivity, and the community is now much easier and cheaper to build new clients for a variety of languages, such as Python.
Version 2.4 introduced a thin client for .NET.
Apache Ignite 2.4 adds tools to optimize the bootstrap and the loading of large amounts of data.
Now you can temporarily turn off WAL (Write Ahead Log) for individual tables in Runtime. This will allow you to download data with minimal disk I / O, which will positively affect throughput.
After turning on the WAL, a checkpoint will be immediately made to the disk using the current data from RAM to ensure data safety.
You can disable WAL using SQL:
or through the API:
In Ignite 2.4, Java 9 is added to existing Java 8 support .
One often heard the question: “when will Ignite for .NET start supporting .NET Core?”. I am pleased to announce that, starting with version 2.4, Ignite.NET receives support for .NET Core . Moreover, there is support for Mono .
This allows you to build cross-platform applications on .NET, expanding the scope of Ignite Linux and Mac worlds.
In a separate article, we will talk more about innovations regarding .NET - the thin client and support for .NET Core and Mono.
Ignite 2.4 made many changes to speed up SQL. These include: multi-threaded index creation , optimization of object deserialization and primary key search , support for SQL batching on the cluster side, and much more.
In the DDL field, you can specify DEFAULT values for columns in tables created through CREATE TABLE, specify settings for embedding values in index trees, and perform DROP COLUMN .
An example of creating an index with new attributes:
In version 2.4, neural networks appeared on Apache Ignite .
Their key advantage is the high productivity of training and model execution. Due to the distributed training of neural networks and the colocation of computing components with data on the cluster nodes, there is no need for ETL and long data transfer to external systems clogging the network.
In addition to these changes, the release also included:
Apache Ignite Database and Caching Platform is a platform for distributed data storage (optimized for active use of RAM), as well as for distributed computing in near real time.
Ignite is used where you need to very quickly process large streams of data that are too tough for centralized systems.
Examples of use: fast distributed cache; a layer aggregating data from disparate services (for example, for Customer 360 View); main horizontally scalable storage (NoSQL or SQL) of operational data; platform for computing, etc.
Next, consider the main innovations of Ignite 2.4.
Baseline topology
If you used Apache Ignite along with its own disk storage, you probably had to deal with:
- with the need to explicitly activate the cluster after starting the minimum required number of nodes;
- with aggressive rebalancing when changing topology, which can be very painful due to active disk I / O.
Baseline Topology solves these problems by fixing a set of nodes that contain disk data and affect cluster activation, topology behavior and rebalancing.
Baseline Topology is such an important change in Ignite that in the near future we will publish a separate article on this feature.
Thin clients
Now you can create thin clients based on your own binary protocol .
Earlier, clients for .NET and C ++ raised inside themselves a full-fledged JVM with Ignite for communication with the cluster. This provided easy and cheap access to the platform’s extensive functionality, but the customers were heavy.
New thin clients are independent and do not need to use JVM. This significantly reduces resource consumption and improves productivity, and the community is now much easier and cheaper to build new clients for a variety of languages, such as Python.
Version 2.4 introduced a thin client for .NET.
var cfg = new IgniteClientConfiguration
{
Host = "127.0.0.1"
};
using (IIgniteClient igniteClient = Ignition.StartClient(cfg))
{
ICacheClient cache = igniteClient.GetCache(CacheName);
Organization org = new Organization(
"GridGain",
new Address("г. Санкт-Петербург, ул. Марата, д. 69–71, корпус В", 191119),
new Email("rusales@gridgain.com"),
OrganizationType.Private,
DateTime.Now
);
// Положить запись в кеш.
cache.Put(1, org);
// Получить запись в десериализованном приведенном к нужному типу формате.
Organization orgFromCache = cache.Get(1);
}
Data Load Optimization
Apache Ignite 2.4 adds tools to optimize the bootstrap and the loading of large amounts of data.
Now you can temporarily turn off WAL (Write Ahead Log) for individual tables in Runtime. This will allow you to download data with minimal disk I / O, which will positively affect throughput.
After turning on the WAL, a checkpoint will be immediately made to the disk using the current data from RAM to ensure data safety.
You can disable WAL using SQL:
-- Выключение WAL для таблицы (и нижележащего кеша).
ALTER TABLE my_table NOLOGGING;
-- Включение, аналогично, для отдельных таблицы и кеша.
ALTER TABLE my_table LOGGING;
or through the API:
ignite.cluster().isWalEnabled(cacheName); // Проверка, включен ли WAL.
ignite.cluster().enableWal(cacheName); // Включение WAL.
ignite.cluster().disableWal(cacheName); // Выключение WAL.
Java 9
In Ignite 2.4, Java 9 is added to existing Java 8 support .
.NET Support Extension
One often heard the question: “when will Ignite for .NET start supporting .NET Core?”. I am pleased to announce that, starting with version 2.4, Ignite.NET receives support for .NET Core . Moreover, there is support for Mono .
This allows you to build cross-platform applications on .NET, expanding the scope of Ignite Linux and Mac worlds.
In a separate article, we will talk more about innovations regarding .NET - the thin client and support for .NET Core and Mono.
Numerous SQL optimizations and enhancements
Ignite 2.4 made many changes to speed up SQL. These include: multi-threaded index creation , optimization of object deserialization and primary key search , support for SQL batching on the cluster side, and much more.
In the DDL field, you can specify DEFAULT values for columns in tables created through CREATE TABLE, specify settings for embedding values in index trees, and perform DROP COLUMN .
An example of creating an index with new attributes:
// INLINE_SIZE — максимальный размер в байтах для встраивания в деревья индекса;
// PARALLEL — количество потоков индексации.
CREATE INDEX fast_city_idx ON sales (country, city) INLINE_SIZE 60 PARALLEL 8;
Neural Networks and Other Machine Learning Enhancements
In version 2.4, neural networks appeared on Apache Ignite .
Their key advantage is the high productivity of training and model execution. Due to the distributed training of neural networks and the colocation of computing components with data on the cluster nodes, there is no need for ETL and long data transfer to external systems clogging the network.
// Подготовка тестовых данных.
int samplesCnt = 100000;
// Тестовые данные будут функцией sin^2 на промежутке [0; pi/2].
IgniteSupplier pointsGen = () -> (Math.random() + 1) / 2 * (Math.PI / 2);
IgniteDoubleFunction f = x -> Math.sin(x) * Math.sin(x);
IgniteCache> cache = LabeledVectorsCache.createNew(ignite);
String cacheName = cache.getName();
// Загрузка данных посредством IgniteDataStreamer.
try (IgniteDataStreamer> streamer =
ignite.dataStreamer(cacheName)) {
streamer.perNodeBufferSize(10000);
for (int i = 0; i < samplesCnt; i++) {
double x = pointsGen.get();
double y = f.apply(x);
streamer.addData(i, new LabeledVector<>(new DenseLocalOnHeapVector(new double[] {x}), new DenseLocalOnHeapVector(new double[] {y})));
}
}
// Инициализация тренера.
MLPGroupUpdateTrainer trainer = MLPGroupUpdateTrainer.getDefault(ignite).
withSyncPeriod(3).
withTolerance(0.0001).
withMaxGlobalSteps(100).
withUpdateStrategy(UpdateStrategies.RProp());
// Создание ввода для тренера.
MLPArchitecture conf = new MLPArchitecture(1).
withAddedLayer(10, true, Activators.SIGMOID).
withAddedLayer(1, true, Activators.SIGMOID);
MLPGroupUpdateTrainerCacheInput trainerInput = new MLPGroupUpdateTrainerCacheInput(conf,
new RandomInitializer(new Random()), 6, cache, 1000);
// Тренировка и сверка результатов.
MultilayerPerceptron mlp = trainer.train(trainerInput);
int testCnt = 1000;
Matrix test = new DenseLocalOnHeapMatrix(1, testCnt);
for (int i = 0; i < testCnt; i++)
test.setColumn(i, new double[] {pointsGen.get()});
Matrix predicted = mlp.apply(test);
Matrix actual = test.copy().map(f);
Vector predicted = mlp.apply(test).getRow(0);
Vector actual = test.copy().map(f).getRow(0);
// Показать предсказанные и фактические значения.
Tracer.showAscii(predicted);
Tracer.showAscii(actual);
System.out.println("MSE: " + (predicted.minus(actual).kNorm(2) / predicted.size()));
Other
In addition to these changes, the release also included:
- initial support for Spark DataFrames ;
- optimization of memory consumption when working with a disk;
- multiple disk storage optimizations (for example, when working with WAL);
- forwarding the new monitoring value in JMX (for example, the long-awaited cache memory , extended topology information will be available for monitoring );
- RPM packages with Ignite ( repository ).