kmorozov September 28, 2016 at 13:50

Measuring Apache Ignite Cache Performance

After we took the first steps in the previous articles of this series of reviews of the distributed Apache Ignite Java framework , got acquainted with the basic principles of building a topology and even made a starter for Spring Boot , the question inevitably arises of caching, which is one of the main functions of Ignite. First of all, I would like to understand whether it is needed when the libraries for caching in Java are already full. It is difficult to surprise the fact that the implementation of the JCache standard (JSR 107) is provided and the possibility of distributed caching today. So before (or instead of) considering the functionality of the Apache Ignite cache, I would like to see how fast it is.

For research, we used the cache2k-benchmark benchmark , designed to prove that the cache2k library has the fastest cache. Here at the same time and check. This article is not intended to provide comprehensive performance testing, or at least scientifically valid, let Apache Ignite developers do this. We just look at the order of magnitude, the main features and the relative position in the ranking, which will also have cache2k and a native cache on ConcurrentHashMap.

Testing methodology

As part of the testing methodology, I did not reinvent the wheel, and took the one described for cache2k . It consists in using the JMH- based library to compare the performance of a number of typical operations:

Filling a cache in multiple threads
Read-only performance

As a reference, the technique considers the values obtained for implementing the cache based on ConcurrentHashMap, since it is assumed that there is nowhere faster. Accordingly, in all categories, the struggle goes for second place. Cache2k-benchmark (hereinafter CB ) implements scripts for cache2k and several other providers: Caffeine, EhCache, Guava, Infinispan, TCache, as well as a native implementation based on ConcurrentHashMap. Other benchmarks are also implemented in CB , but we will limit ourselves to these two.

The measurements were carried out under the following conditions:

JDK 1.8.0_45
JMH 1.11.3
Intel i7-6700 3.40Ghz 16Gb RAM
Windows 7 x64
JVM flags: -server -Xmx2G
Apache Ignite 1.7.0

The operation of the Apache Ignite cache was investigated in several modes that differ in topology (here it is recommended to recall the basic concepts of Apache Ignite topology ) and load distribution:

Local cache (cacheMode = LOCAL) on the server node;
Distributed cache on 1 machine (cacheMode = PARTITIONED, FULL_ASYNC), server-server;

According to CB requirements , the IgniteCacheFactory class was implemented (the code is available on GitHub , based on the CB fork ). The server and client are created with the following settings:

Server configuration

127.0.0.1:47520..47529

It is important that the cache settings for the client and server are the same.

The server will be created from the command line outside the test using the same JVM with the options -Xms1g -Xmx14g -server -XX: + AggressiveOpts -XX: MaxMetaspaceSize = 256m, that is, I give him almost all the memory. Let's start the server and connect to it with a visor (for details, refer to the second article in the series ). Using the cache command, we verify that the cache exists and is pristine:

We connect with CB using the class

Benchmark cache factory

public class IgniteCacheFactory extends BenchmarkCacheFactory {
    static final String CACHE_NAME = "testCache";
    static IgniteCache cache;
    static Ignite ignite;
    static synchronized IgniteCache getIgniteCache() {
        if (ignite == null)
            ignite = Ignition.ignite("testGrid");
        if (cache == null)
            cache = ignite.getOrCreateCache(CACHE_NAME);
        return cache;
    }
    @Override
    public BenchmarkCache create(int _maxElements) {
        return new MyBenchmarkCache(getIgniteCache());
    }
    static class MyBenchmarkCache extends BenchmarkCache {
        IgniteCache cache;
        MyBenchmarkCache(IgniteCache cache) {
            this.cache = cache;
        }
        @Override
        public Integer getIfPresent(final Integer key) {
            return cache.get(key);
        }
        @Override
        public void put(Integer key, Integer value) {
            cache.put(key, value);
        }
        @Override
        public void destroy() {
            cache.destroy();
        }
        @Override
        public int getCacheSize() {
            return cache.localSize();
        }
        @Override
        public String getStatistics() {
            return cache.toString() + ": size=" + cache.size();
        }
    }
}

Here we connect in client mode to our server and take the cache from it. It is important to stop the client at the end of the test, otherwise JMH swears that at the end of the test there are still working threads - Ignite creates a lot of them for its functioning. Also, please note that the offset is the time to delete the cache after each iteration. We will consider this a cost of the research method, that is, we look not only at the performance of the cache itself, but also at the cost of administering it.

Benchmark class

@State(Scope.Benchmark)
public class IgnitePopulateParallelOnceBenchmark extends PopulateParallelOnceBenchmark {
    Ignite ignite;
    {
        if (ignite == null)
            ignite = Ignition.start("ignite/ignite-cache.xml");
    }
    @TearDown(Level.Trial)
    public void destroy() {
        if (ignite != null) {
            ignite.close();
            ignite = null;
        }
    }
}

results

After building the project through mvn clean install, you can run the tests, for example with the
java -jar command\ benchmarks.jar PopulateParallelOnceBenchmark -jvmArgs "-server -Xmx14G -XX: + UseG1GC -XX: + UseBiasedLocking -XX: + UseCompressedOops" -gc true -f 2 -wi 3 -w 5s -i 3 -r 30s -t 2 - p cacheFactory = org.cache2k.benchmark.thirdparty.IgniteCacheFactory -rf json -rff e: \ tmp \ 1.json. JMH settings are taken from the original benchmark, we will not discuss them here. The "-t 1" parameter indicates the number of threads that we use with the cache. I indicated 14Gb of memory, just in case. "-f 2" means that two JVM forks will rise to run the test, this contributes to a sharp decrease in the confidence interval (the "error" column in the JMH output).

Filling a cache in multiple threads

First, run the test for Apache Ignite with cacheMode = LOCAL. Since in this case there is no sense in interacting with the server, we will raise the node for testing in server mode and will not connect to anyone. It measures the time it took to cache numbers from 1 to 1 million, 2 million, 4 million, 8 million. For the number of threads 1, 4 and 8 (I have an 8-core processor), the results will be as follows:

We see that if 4 streams are approximately twice as fast as 1 stream, then adding 4 more streams gives a gain of about 20%. That is, the scaling is nonlinear. For comparison, let's see what ConcurrentHashMap and cache2k show.

ConcurrentHashMap:

cache2k:

Thus, in local mode, when you insert, the Ignite cache is about 10 times slower than ConcurrentHashMap and 4-5 times slower than cache2k. Next, let’s try to evaluate which overhead gives partitioning the cache between two server nodes on the same machine (that is, the cache will be divided in half) - Ignite developers took steps so that it was not gigantic. For example, they use their own serialization, which they say is 20 times faster than their own. During the test execution, you can see the visor, now this makes sense, we have the topology:

At the end we see these heartbreaking numbers:

That is, partitioning the cache did not cost us very cheaply, 10 times it got worse. The REPLICATED cache mode has not been investigated, in which data would be stored in both nodes.

Only reading

In order not to complicate the picture with many parameters, we will conduct this test in 4 threads, Ignite will be launched only locally. Here we use ReadOnlyBenchmark . The cache is filled with 100k entries and values are randomly selected from it, with different hit rates. The number of operations per second is measured.

Here is the Cache2k / ConcurrentHashMap / Ignite data:

That is, Cache2k is 1.5-2.5 times worse than ConcurrentHashMap, and Ignite is even 2-3 times worse.

conclusions

Thus, Ignite, to say the least, does not shock at the speed of its caching. I will try to answer possible reproaches in advance:

I just don’t know how to cook it , and if Ignite is pushed, it will be better. Well everything, if ottyunit, would be better. The work was studied in the default configuration, in 90% of cases it will be the same in production;
Apples and bananas , products of different classes, microscope nails, etc. Although it might have been compared with something more sophisticated like Inifinispan, no one demanded the impossible from Ignite in this study;
Eliminate overhead , put out of operation the expensive operations of raising the node and creating / deleting the cache, reduce the frequency of hearthbeat, etc. But we are not measuring a horse in a vacuum ?;
This product is not intended for local use , you need enterprise equipment. It is possible, but it will only smear the whole overhead in topology, and here we saw it all at once. During testing, %% CPU and memory never reached 100%;
This is the specifics of the product. You can look at the results as very worthy, given the incredible power of Ignite. Keep in mind that caching is done in another thread, through sockets, etc. Ignite, on the other hand, has no other way.

Well and so on. In general, in my opinion, Ignite should be used as an architectural framework for distributed applications, and not as a source of performance. Although, perhaps, he is able to accelerate something even more inhibited. IMHO, of course.

I invite you to share my observations about the performance of Ignite.

References

Tags: