How not to break an Apache Ignite cluster from the beginning

Hello! Below is a transcript of a video recording of a speech at the Apache Ignite community’s community meeting in St. Petersburg on June 20. You can download slides by the link .

There is a whole class of problems faced by novice users. They just downloaded Apache Ignite for themselves, run the first two, three, ten times, and come to us with questions that are solved in a similar way. Therefore, I propose to create a checklist that will save you a lot of time and nerves when you make your first applications on Apache Ignite. We'll talk about preparing for launch; how to make the cluster assembled; how to run any calculations in the compute grid; how to prepare the data model and code so that you can write your data to Ignite and then read them successfully. And the main thing: how not to break anything from the very beginning.

Preparing for launch - customize logging

We need logs. If you have ever asked for an Apache Ignite mailing list or a question on StackOverflow, such as “why everything was hanging on me,” most likely you were asked to send all the logs from all the nodes first.

Naturally, Apache Ignite has logging enabled by default. But there are nuances. First of all, stdoutnot so much is written in Apache. By default, it runs in a so-called quiet mode. In stdoutyou will see only the most terrible mistakes, and everything else will be stored in the file, the path to which Apache Ignite displays at the very beginning (by default - ${IGNITE_HOME}/work/log). You do not erase it and keep the logs longer, it can be very useful.

stdout Ignite at default startup

To make it easier to learn about problems, without getting into separate files and not setting up a separate monitoring for Apache Ignite, you can start it in verbose mode with the command

ignite.sh -v

and then the system will start writing about all the events stdoutalong with the rest of the journaling of the application.

Check the logs! Very often they can find solutions to your problems. If the cluster has collapsed, then very often in the log you can see messages like “Increase such and such timeout in such and such configuration. We fell off because of him. He is too small. The network is not good enough. ”

Cluster assembly

Unwelcome guests

The first problem that many face is the uninvited guests in your cluster. Or you yourself turn out to be an uninvited guest: you start up a fresh cluster and suddenly you see that in the very first topology snapshot, instead of one node, you have two servers from the very beginning. How so? You only run one.

A message saying that there are two nodes in a cluster

The fact is that by default Apache Ignite uses Multicast, and at startup it will search for all other Apache Ignite that are on the same subnet, in the same Multicast group. And if it does, it will try to connect. And in case of unsuccessful connection - it will not start at all. Therefore, in the cluster on my working laptop regularly appear extra nodes from the cluster on the laptop colleagues, which of course is not very convenient.

How to protect yourself from this? The easiest way to configure static IP. Instead TcpDiscoveryMulticastIpFinder, which is used by default, is TcpDiscoveryVmIpFinder. There write all the IP and ports to which you connect. It is much more convenient and will protect you from a large number of problems, especially in environments for development and testing.

Too many addresses

Next problem. You have disabled Multicast, start the cluster, in a single config file you have registered a decent amount of IP from different environments. And it so happens that you start the first node in a fresh cluster for 5–10 minutes, although all subsequent ones connect to it in 5–10 seconds.

Take a list of three IP addresses. For each prescribe ranges of 10 ports. There are a total of 30 TCP addresses. Since Apache Ignite should try to connect to an existing cluster before creating a new cluster, it will check each IP in turn. On your laptop, it may not hurt, but in some cloud environments, port scan protection is often enabled. That is, when accessing a closed port on an IP address, you will not receive any response until the timeout has passed. The default is 10 seconds. And if you have 3 addresses of 10 ports, it turns out 3 * 10 * 10 = 300 seconds of waiting - the very 5 minutes to connect.

The solution is obvious: do not prescribe extra ports. If you have three IP, then you hardly need a default range of 10 ports. This is convenient when you are testing something on a local machine and launching 10 nodes. But in real systems, one port is usually enough. Or, disable the port scan protection on the internal network if you have the option.

A third common problem is IPv6. You may see strange network error messages: could not connect, could not send a message, node segmented. This means that you have fallen off the cluster. Very often, such problems are caused by a mixed environment of IPv4 and IPv6. This is not to say that Apache Ignite does not support IPv6, but at the moment there are certain problems.

The simplest solution is to give the Java machine the option.

-Djava.net.preferIPv4Stack=true

Then Java and Apache Ignite will not use IPv6. This solves a significant part of the problems with collapsing clusters.

Preparation of the code base - serialize correctly

The cluster is going, you need to run something in it. One of the most important elements of how your code interacts with Apache Ignite code is Marshaller, or serialization. To write something into memory, in persistence, send over the network, Apache Ignite first serializes your objects. You can see messages that begin with the words: “cannot be written in binary format” or “cannot be serialized using BinaryMarshaller”. There will be only one such warning in the log, but noticeable. This means that you need to tweak your code a little more to make friends with Apache Ignite.

Apache Ignite uses three mechanisms for serialization:

JdkMarshaller - regular Java serialization;
OptimizedMarshaller - slightly optimized Java serialization, but the mechanisms are the same;
BinaryMarshaller- a serialization written specifically for Apache Ignite, used everywhere under its hood. She has a number of advantages. Somewhere we can avoid additional serialization and deserialization, and somewhere we can even get a non-deserialized object in the API, work with it directly in the binary-format as with something like JSON.

BinaryMarshallercan serialize and deserialize your POJOs that have nothing but fields and simple methods. But if you have custom serialization through readObject()and writeObject(), if you use it Externalizable, it BinaryMarshallerwill not cope. He will see that your object cannot be serialized by the usual recording of non-transient fields and will surrender - roll back to OptimizedMarshaller.

To make such objects friends with Apache Ignite, you need to implement an interface Binarylizable. It is very simple.

For example, there is a standard TreeMapfrom Java. It has custom serialization and deserialization via read and write object. She first describes some fields, and then writes OutputStreamthe data itself in length.

Implementation TreeMap.writeObject()

privatevoidwriteObject(java.io.ObjectOutputStream s)throws java.io.IOException {
    // Write out the Comparator and any hidden stuff
    s.defaultWriteObject();
    // Write out size (number of Mappings)
    s.writeInt(size);
    // Write out keys and values (alternating)for (Iterator<Map.Entry<K,V>> i = entrySet().iterator(); i.hasNext(); ) {
        Map.Entry<K,V> e = i.next();
        s.writeObject(e.getKey());
        s.writeObject(e.getValue());
    }
}

writeBinary()and readBinary()from Binarylizablework in exactly the same way: BinaryTreeMapwraps itself in the ordinary TreeMapand writes it in OutputStream. This method is easy to write, and it will pretty much increase performance.

Implementation BinaryTreeMap.writeBinary()

publicvoidwriteBinary(BinaryWriter writer)throws BinaryObjectException {
    BinaryRawWriter rewriter = writer. rewrite ();
    rawWriter.writeObject(map.comparator());
    int size = map.size();
    rawWriter.writeInt(size);
    for (Map.Entry<Object, Object> entry : ((TreeMap<Object, Object>)map).entrySet()) {
        rawWriter.writeObject(entry.getKey());
        rawWriter.writeObject(entry.getValue());
    }
}

Run in Compute Grid

Ignite allows not only to store data, but also to run distributed computing. How do we run some lambda so that it spreads across all servers and runs?
For starters, what is the problem with these code examples?

What is the problem?

Foo foo = …;
Bar bar = ...;
ignite.compute().broadcast(
    () -> doStuffWithFooAndBar(foo, bar)
);

And if so?

Foo foo = …;
Bar bar = ...;
ignite.compute().broadcast(new IgniteRunnable() {
   @Overridepublicvoidrun(){
       doStuffWithFooAndBar(foo, bar);
   }
});

As many who are familiar with the pitfalls of lambda and anonymous classes will guess, the problem is in capturing variables from the outside. For example, we send lambda. It uses a pair of variables that are declared outside the lambda. This means that these variables will travel with it and fly across the entire network to all servers. And then all the same questions arise: are these objects friendly with BinaryMarshaller? What size are they? Do we even want them to be transferred somewhere, or are these objects so large that it is better to transmit some kind of ID and recreate the objects inside the lambda already on the other side?

An anonymous class is even worse. If the lambda can not take this with itself, throw it out, if it is not used, then the anonymous class will take it necessarily, and it usually does not lead to anything good.

The following example. Again, lambda, but which uses the Apache Ignite API a little.

Use Ignite inside compute closure wrong

ignite.compute().broadcast(() -> {
   IgniteCache foo = ignite.cache("foo");
   String sql = "where id = 42";
   SqlQuery qry = new SqlQuery("Foo", sql).setLocal(true);
   return foo.query(qry);
});

In the original version, it takes the cache and locally makes some kind of SQL query in it. This is such a pattern when you need to send a task that works only with local data on remote nodes.

What is the problem? Lambda again captures the link, but now not to the object, but to the local Ignite on the node with which we send it. And it even works, because the Ignite object has a method readResolve()that allows, during deserialization, to replace the Ignite that came through the network to the local one on the node where we sent it. But this also sometimes leads to undesirable consequences.

Basically, you just transmit more data over the network than you would like. If you need to get from some code, the launch of which you do not control, to Apache Ignite or some of its interfaces, the simplest thing is to use the method Ignintion.localIgnite(). You can call it from any thread that was created by Apache Ignite, and get a link to a local object. If you have lambdas, services, whatever, and you understand that you need Ignite here, then I recommend this method.

Use Ignite inside compute closure correctly - through localIgnite()

ignite.compute().broadcast(() -> {
   IgniteCache foo = Ignition.localIgnite().cache("foo");
   String sql = "where id = 42";
   SqlQuery qry = new SqlQuery("Foo", sql).setLocal(true);
   return foo.query(qry);
});

And the last example in this part. In Apache Ignite, there is a Service Grid, with which you can deploy microservices directly in a cluster, and Apache Ignite will help you constantly keep the right number of instances online. Suppose in this service we also need a link to Apache Ignite. How to get it? We could use it localIgnite(), but then this link will have to be manually saved in the field.

Service stores Ignite in the wrong field - accepts it as a constructor argument

MyService s = new MyService(ignite)
ignite.services().deployClusterSingleton("svc", s);
...
publicclassMyServiceimplementsService{
   private Ignite ignite;
   publicMyService(Ignite ignite){
       this.ignite = ignite;
   }
   ...
}

There is a simpler way. We still have full-fledged classes, not lambda, so we can annotate the field as @IgniteInstanceResource. When the service is created, Apache Ignite will put itself there, and it will be possible to use it calmly. I strongly advise you to do this, and not try to pass Apache Ignite and its children to the constructor.

Service uses @IgniteInstanceResource

publicclassMyServiceimplementsService{
   @IgniteInstanceResourceprivate Ignite ignite;
   publicMyService(){ }
   ...
}

Write and read data

Watching the baseline

We now have an Apache Ignite cluster and prepared code.

Let's imagine this scenario:

One REPLICATEDcache - copies of data are available on all nodes;
Native persistence is included - we write to disk.

We start one node. Since native persistence is included, we need to activate the cluster before working with it. Activate. Then we run a few more nodes.
Everything seems to be working: recording and reading are normal. All nodes have copies of data, you can safely stop one node. But if you stop the very first node from which the launch started, then everything breaks down: the data disappears, and the operations cease to take place.

The reason for baseline topology is the set of nodes that store persistence data on themselves. All other nodes will not have persistent data.

This set of nodes is first determined at the time of activation. And those nodes that you added later are no longer included in the number of baseline nodes. That is, the set of baseline topology consists of only one, the very first node, when stopped, everything breaks down. To prevent this from happening, first start all the nodes, and then activate the cluster. If you need to add or remove a node with the command

control.sh --baseline

can see which nodes are listed there. The same script can update the baseline to the current state.

Usage example control.sh

Data collocation

Now we know that the data is saved, we will try to read it. We have SQL support, you can do it SELECT- almost like in Oracle. But at the same time we are able to scale and run on any number of nodes, the data is stored distributed. Let's look at this model:

publicclassPerson{
   @QuerySqlFieldpublic Long id;  
   @QuerySqlFieldpublic Long orgId;
}
publicclassOrganization{
   @QuerySqlFieldprivate Long id;
}

Request

SELECT *
FROM Person as p 
JOINOrganizationas o ON p.orgId = o.id

will not return all data. What's wrong?

The person ( Person) refers to the organization ( Organization) by ID. This is a classic foreign key. But if we try to combine two tables and send such an SQL query, then with several nodes in the cluster, we will not receive all the data.

The fact is that by default SQL JOINworks only within a single node. If SQL constantly went around the cluster to collect data and return the full result, it would be incredibly slow. We would lose all the advantages of a distributed system. Therefore, instead, Apache Ignite looks only at local data.

To get the right results, we need to place the data together (colocation). That is, in order to correctly combine Person and Organization, data from both tables should be stored on the same node.

How to do it? The simplest solution is to declare an affinity key. This is the value that determines on which node, in which partition, in which group of records one or another value will be located. If we declare an organization ID in Personan affinity key, it will mean that people with that organization ID must be on the same node as the organization with the same ID.

If for some reason you cannot do this, there is another, less effective solution - enable distributed joins. This is done through an API, and the procedure depends on whether you are using Java, JDBC, or something else. Then JOINthey will be slower, but they will return the correct results.

Consider how to work with affinity keys. How do we understand that such and such an ID, such a field is suitable for the definition of affinity? If we say that all people with the same orgIdwill be kept together, it means orgIdthat this is one indivisible group. We cannot distribute it across multiple nodes. If 10 organizations are stored in the database, then there will be 10 indivisible groups that can be put on 10 nodes. If there are more nodes in the cluster, then all the “extra” nodes will be left without groups. It is very difficult to determine in runtime, so think about it in advance.

If you have one big organization and 9 small ones, then the size of the groups will be different. But Apache Ignite does not look at the number of records in affinity groups when it distributes them among the nodes. Therefore, he will not put one group on one node, and 9 others on another, in order to even up the distribution. Rather, he will put them 5 and 5, (or 6 and 4, or even 7 and 3).

How to make the data evenly distributed? Let us have

To keys;
And various affinity keys;
P partitions, that is, large groups of data that Apache Ignite will distribute between the nodes;
N nodes.

Then you need to satisfy the condition

K >> A >> P >> N

where >>is "a lot more" and the data will be distributed relatively evenly.

By the way, the default is P = 1024.

A very even distribution you probably will not work. This was in Apache Ignite 1.x to 1.9. It was FairAffinityFunctionnot called and worked very well - it led to too much traffic between nodes. Now the algorithm is called RendezvousAffinityFunction. It does not give an absolutely fair distribution, the error between the nodes will be plus or minus 5-10%.

Checklist for new Apache Ignite users

Configure, read, store logs
Turn off multicast, list only those addresses and ports that you use.
Disable IPv6
Prepare your classes for BinaryMarshaller
Watch your baseline
Configure affinity collocation

Tags: