Akka antipatterns: too many actors
- Recovery Mode
By akka few materials on Habré. I decided to translate some of the antipatterns described by Manuel on his blog. They really may not be obvious to people facing the framework for the first time.
It occurred to me that I had not yet written about this very frequent anti-pattern. It can often be found in the code of developers who are just starting to work with the actor model.
There are two ways to get too many actors:
- having developed a system with too many different types of actors, many of which are not needed
- having created a very large number of actors in runtime, when this is not necessary and inefficient.
Let's look at these options in detail.
Too many types of actors.
The general idea is something like this: “we have actors, so everything must be an actor”.
The actor model makes it easy to write asynchronous applications. It does this by providing the illusion of synchronous execution of the code inside the actor - there is no need to worry about parallel access to the state of one actor, because only the actor can access its state, and messages are processed one at a time, one by one.
But in fact, not everything needs to be done asynchronously. Calls to methods that are associated exclusively with the CPU (and are not “blocked” in the sense that they do not completely overload the CPU, for example, calculating the value of Pi) should not be performed asynchronously.
I often see code with a large number of different actors interacting with each other and not doing anything that has a great advantage in asynchronous or simultaneous execution. In these projects, the same state should be stored by each of these actors or transferred to them in each message.
This approach has two drawbacks:
- You get nothing in terms of performance. On the contrary, there are overhead costs associated with the creation of messages and their transmission.
- With each type of actor and its associated messages, the system becomes more complex to understand and maintain.
Therefore, when designing actor systems, you need to think about what really needs to be asynchronous, basically these are:
- calls to external systems (outside of your jvm)
- calls to blocking operations (obsolete APIs, heavy calculations, ...)
Too many actors in runtime
The general idea is something like this: “the more actors we have, the faster everything will go.”
And indeed, actors are easy, and you can run millions of them on one virtual machine . Yes, you can. But is it necessary?
If you can, it does not mean that you have a
short answer: not always - it depends on what you do with the actors.
If you have a lot of long-lived actors in your system, each of which contains a few states and interact with each other from time to time, you may well end up with a million actors - and this is a legitimate use case, very well supported by Akka. For example, you can create a system with a large number of users, where each user is represented by an actor. The pure Akka actor takes only 300 bytes of memory, so it is possible to create millions on one machine and leave them to work without worrying about anything. And if in the end you create many actors or actors with a large state that they no longer fit into the memory of one machine, cluster sharding simplifies the distribution of actors across several machines.
However, if you have several types of actors that are involved in calculating something — for example, parsing an XML document — it’s doubtful to create millions of such actors (whether directly or through a router).
The processor has a fixed number of cores (hardware threads) at its disposal, and the processing of messages by Akka actors is performed in the ExecutionContext based on the thread pool. By default, this is a fork-join-executor based on ForkJoinPool, added in Java 7.
But, despite its technical advantage, forkjoinpool is not a magic that repeals the laws of physics. If you have one million actors, each of which analyzes an XML document (already loaded into memory) and 4 hardware streams, the system will not work much better than if you only had 4 actors analyzing these XML documents (with condition of uniform load). In fact, your system will work much better with 4 actors, because there will be only minimal overhead in terms of memory planning and management. By the way, if there are only a few actors on your system, check your thread pool, which is probably trying to reuse the same thread for the same actor.
In general, the system will not work faster if you create a lot of actors.
Actors without state
Actors are object-oriented correctly (unlike, say, from objects in Java): their state is not visible from the outside, and they communicate through messages. It is impossible to break the encapsulation, because it is impossible to look into the state of the actor during its operation. That's the whole point of the actors: they give you the illusion of a safe space in which messages are executed sequentially, one after another, allowing you to use the changing state inside your actor without worrying about the race condition (well, almost without worrying: the main thing is not to allow the state to leak).
That is why the use of actors who do not have a state is somewhat strange, to say the least. With the exception of the actors who control the monitoring of large parts of the hierarchy of the actor's system (for example, setting up backup supervisors), the actors are really designed to work with long calculations that have state. Speaking long, I mean that the actor will process several messages throughout its life, producing different results depending on its state, as opposed to one-time calculations. For them, futures are an excellent abstraction: they allow asynchronous code execution, are ideal for networking or disk-related computing (or really processor-intensive tasks), can be composited, and have a failure handling mechanism.
In general: do not use actors, if you do not have a state - they are not intended for this.