From the experience of using SObjectizer: actors in the form of finite state machines - is it bad or good?

    Having acquainted readers with the SObjectizer framework , its capabilities and features, we can go on to talk about some of the lessons that we have been able to learn over more than fourteen years of using SObjectizer in the development of C ++ software. Today we’ll talk about when agents in the form of state machines are not a good choice, but when are they. That the possibility of creating a large number of agents is not so much a solution as a problem in itself. And how the first relates to the second ...


    So, in the three previous articles ( one , two, and three ), we observed how the email_analyzer agent developed from a very simple to a more or less complex class. I think that many who looked at the final version of email_analyzer had a question: “But it’s very difficult, couldn’t it be easier?”


    It turned out so difficult because agents are represented as finite state machines. In order to process an incoming message, a separate method must be described - an event handler. In order for the agent to start a new event handler, the current handler must end. Therefore, in order to send a request and receive a response, the agent must complete its current handler in order to give the dispatcher the ability to call the appropriate handler when a response arrives. Those. instead:


    void some_agent::some_event() {
      ...
      // Отсылаем запрос.
      send< request >(receiver, reply_to, params…);
      // И сразу же ждем результ.
      auto resp = wait_reply< response >(reply_to);
      ... // Обрабатываем ответ.
    }
    

    I have to write like this:


    void some_agent::some_event() {
      ...
      // Чтобы получить результат нужно на него подписаться.
      so_subscribe(reply_to).event(&some_agent::on_response);
      // Отсылаем запрос.
      send< request >(receiver, reply_to, params...);
      // Больше оставаться в some_event нет смысла. 
      // Нужно вернуть управление диспетчеру дабы он смог вызвать
      // нас потом, когда придет ответ.
    }
    void some_agent::on_response(const response & resp) {
      ... // Обрабатываем ответ.
    }
    

    Hence the volume and the complexity of the resulting email_analyzer agent.


    Perhaps in this approach there are some tricks that would reduce the amount of writing by 20-30%, but in principle the situation will not change.


    Here that can significantly affect the comprehensibility and compactness, is the departure from the event model based on callbacks towards linear code with synchronous operations. Sort of:


    void email_analyzer(context_t ctx, string email_file, mbox_t reply_to) {
      try {
        // Выполняем запрос синхронно.
        auto raw_content = request_value< load_email_succeed, load_email_request >(
            ctx.environment().create_mbox( "io_agent" ),
            1500ms, // Ждать результата не более 1.5s
            email_file ).content_;
        auto parsed_data = parse_email( raw_content );
        // Запускаем агентов-checker-ов, которые будут отсылать результаты
        // в отдельный message chain, специально созданный для этих целей.
        auto check_results = create_mchain( ctx.environment() );
        introduce_child_coop( ctx,
          disp::thread_pool::create_disp_binder( "checkers",
            disp::thread_pool::bind_params_t{} ),
          [&]( coop_t & coop ) {
              coop.make_agent< email_headers_checker >(
                  check_results, parsed_data->headers() );
              coop.make_agent< email_body_checker >(
                  check_results, parsed_data->body() );
              coop.make_agent< email_attach_checker >(
                  check_results, parsed_data->attachments() );
          } );
        // Т.к. все обработчики результатов будут очень похожи, то вынесем
        // их логику в отдельную локальную функцию.
        auto check_handler = [&]( const auto & result ) {
            if( check_status::safe != result.status )
              throw runtime_error( "check failed: " + result );
          } );
        // Ждем результатов не более 0.75s и прерываем ожидание, если
        // хотя бы один результат оказался неудачным.
        auto r = receive( from( check_results ).total_time( 750ms ),
          [&]( const email_headers_check_result & msg ) { check_handler( msg ); },
          [&]( const email_body_check_result & msg ) { check_handler( msg ); },
          [&]( const email_attach_check_result & msg ) { check_handler( msg ); } );
        // Если собраны не все ответы, значит истекло время ожидания.
        if( 3 != r.handled() )
          throw runtime_error( "check timedout" );
        // Ну а раз уж добрались сюда, значит проверка прошла успешно.
        send< check_result >( reply_to, email_file, check_status::safe );
      }
      catch( const exception & ) {
        send< check_result >( reply_to, email_file, check_status::check_failure );
      }
    }
    

    Here, in this case, a more compact and understandable code would be obtained that would be similar to solving this problem in languages ​​such as Erlang or Go.


    Our experience suggests that in situations where the agent performs some linear set of operations of the form “sent the request, the only answer immediately began to wait”, its implementation in the form of a finite state machine will be disadvantageous in terms of code volume and complexity. Instead of simply waiting for a response and immediately continuing to work after receiving it, the agent needs to complete its current event handler, and all other actions must be moved to another handler. If an agent performs N consecutive asynchronous operations during its lifetime, then this agent will most likely have an (N + 1) handler. What is not good, because the development and maintenance of such an agent will take a lot of time and energy.


    The situation will be completely different if at every moment when the agent is waiting for something, several different messages may come to him, and the agent will have to respond to each of them. For example, an agent may wait for the result of the current operation, and at this moment requests may come to the agent with a check on the status of the operation and demanding to perform a new operation. In this case, you will have to paint a reaction to all expected types of messages at each waiting for the agent, and this can also turn the agent code into a voluminous and poorly understood noodle quite quickly.


    Since SObjectizer currently supports agents only in the form of finite state machines, it is necessary to carefully evaluate how well the logic of applied agents lies on finite state machines. If not very good, then SObjectizer may not be the best choice and it makes sense to look at solutions that use coroutines. For example, boost.fiber or Synca (about the last there were interesting articles on Habr: No. 1 and No. 2 ).


    So the three previous articles in the miniseries “SObjectizer: from simple to complex”, on the one hand, show the capabilities of SObjectizer, but, on the other hand, allow you to see where you can go, there is an approach to solving the problem from the wrong side. For example, if you start using agents in the form of finite state machines where it would make sense to use agents in the form of coroutines.


    But if for many cases coroutines are more profitable than state machines, then why does SObjectizer not support agents as coroutines? There are several serious reasons for this, both technical and organizational. Probably, if coroutines were part of the C ++ language, coroutine agents in SObjectizer would already be. But since coroutines in C ++ are now available only through third-party libraries and the topic is not the simplest, we are not in a hurry to add this functionality to SObjectizer. Moreover, this problem has a completely different side. But to talk about it, you need to go from afar ...


    A long time ago, when the first version of SObjectizer was launched, we ourselves made the same mistake as many newcomers who first got their hands on an actor model-based tool: if you can create agents for every sneeze, then you need to create one. The execution of any task should be presented as an agent. Even if this task consists in receiving only one request and sending only one answer. In general, intoxication from new opportunities, because of which you suddenly start to adhere to the opinion that "in the world there is nothing but agents."


    This resulted in several negative consequences.


    Firstly, the application code turned out to be more voluminous and more complicated than we would like. After all, asynchronous messages are prone to losses, and where one synchronous call could be written, there were bells and whistles around sending a request message, processing a response message, a timeout to diagnose a loss of a request or response. When analyzing the code, it turned out that somewhere in half the cases, the interaction on the messages was justified, because there, data was transferred between different workflows. And in the remaining places it was possible to merge a bunch of small agents into one big one and perform all the operations inside it through normal synchronous function calls.


    Secondly, it turned out that the behavior of an application built on agents is much more difficult to monitor and even harder to predict. A good analogy is observing the flight of a large flock of birds: although the rules of behavior of an individual individual are simple and understandable, it is almost impossible to predict the behavior of the entire flock. So it is in the application, in which tens of thousands of agents live at the same time: each of them works in a completely understandable way, but the combined effect of their joint work can be unpredictable.


    What is still bad is the increase in the amount of information that is needed to understand what is happening in the application. Let's take our email_analyzer example. A single analyzer_manager agent can supply information such as the total number of requests waiting in its queue, the total number of live email_analyzer agents, the minimum, maximum and average time to wait for a request in the queue (similarly to the request processing times). Therefore, monitoring the activity of analyzer_manager is not a problem. But the collection, aggregation and processing of information from individual email_analyzer s is already more difficult. Moreover, the more difficult the more these agents are and the shorter their life time.


    So, the fewer agents live in the application, the easier it is to monitor them, the easier it is to understand what is happening and how, the easier it is to predict the behavior of the application in certain conditions.


    Thirdly, the unpredictability that occurs from time to time in applications with tens of thousands of agents inside can lead the application to a partially or completely inoperative state.


    A typical case: in the application for a hundred thousand agents. They all with the help of periodic messages control the timeouts of their operations. And then at one point, time-outs immediately begin for, say, 20 thousand agents. Accordingly, the message queues for processing swell on the working threads. These queues begin to be raked, each agent receives his message and processes it. But while these 20 thousand messages are processed, too much time passes and another 20 thousand arrives from the timer. This is in addition to that part of the old messages that are still queuing. It is clear that he doesn’t have time to process everything and another 20 thousand messages arrive. Etc. The application seems to be honestly trying to work, but gradually degrades to complete inoperability.


    As a result of walking on this rake at the very beginning of using SObjectizer in our projects, we came to the conclusion that the opportunity to create a million agents is more of a marketing bullshit than the thing demanded in our practice * . And that approach, which became known as SEDA-way , allows you to build applications that are much easier to control and that behave much more predictably.


    The essence of using the SEDA approach in conjunction with the actor model is that instead of creating actors that perform a whole chain of sequential operations, it is better to create one actor for each operation and build them in a pipeline. For our example with email parsers, instead of doing email_analyzer agents that sequentially download email content, parsing and analyzing this content, we could create several stage agents. One stage agent would control the request queue. The next stage agent would handle file upload operations with emails. The next stage agent would parse the loaded content. The next is analysis. Etc.


    The crucial point is that in the previous implementations email_analyzer itself initiates all operations, but only for one specific email. And in the SEDA approach, we would have one agent for each operation, but each agent could do it for several emails at once. By the way, traces of this SEDA approach are visible even in our examples as an IO agent, which is nothing more than a stage agent from SEDA.


    And so, when we began to actively use the ideas from SEDA, it turned out that stage agents are quite conveniently implemented as finite state machines, because they at each particular moment in time have to expect different incoming influences and react to them depending on their state. Here, in our opinion, in the long term, finite state machines are more convenient than coroutines.


    By the way, one more point can be noted that is often paid attention to those who first get acquainted with SObjectizer: verbosity of agents. Indeed, as a rule, the agent in SObjectizer is a separate C ++ class, which will have at least a constructor, there will be some fields that must be initialized in the constructor, the so_define_agent () method will be redefined, there will be several event handlers as separate methods ... It is clear that for simple cases all this leads to a fair syntactic overhead (s). For example, in Just :: Thread Pro, a simple actor logger might look like this:


    ofstream log_file("...");
    actor logger_actor( [&log_file] {
        for(;;) {
          actor::receive().match([&](std::string s) {
              log_file << s << endl;
            } );
        }
      } );
    

    Whereas in SObjectizer, if you use the traditional approach to writing agents, you will need to do something like:


    class logger_actor : public agent_t {
    public :
      logger_actor( context_t ctx, ostream & stream ) : agent_t{ctx}, stream_{stream} {}
      virtual void so_define_agent() override {
        so_subscribe_self().event( &logger_actor::on_message );
      }
    private :
      ostream & stream_;
      void on_message( const std::string & s ) {
        stream_ << s << endl;
      }
    };
    ...
    ofstream log_file("...");
    env.introduce_coop( [&log_file]( coop_t & coop ) {
      coop.make_agent< logger_actor >( log_file );
    } );
    

    Obviously, there are more scribbles in SObjectizer. However, the paradox is that if you stick to the SEDA approach, when there are not many agents, but they can process different types of messages, the agent code swells quite quickly. Partly because of the very logic of agents' work (as a rule, more complex), partly because agents are filled with additional things, such as logging and monitoring. And here it turns out that when the main application code of the agent has a volume of several hundred lines, or even more, the size of the syntax overhead on the part of SObjectizer is completely insignificant. Moreover, the larger and more complex the agent, the more profitable is its representation as a separate C ++ class. In toy examples, this is not visible, but in the "combat" code it is felt quite strongly (here, say, a small example not the most difficult real agent).


    Thus, on the basis of our practical experience, we came to the conclusion that if you properly combine the actor model and the SEDA approach, then representing agents in the form of finite automata is a completely normal solution. Of course, somewhere such a decision will lose to coroutines in terms of expressiveness. But in general, agents in the form of state machines work more than well and do not create any special problems. With the exception, perhaps, of comparing various approaches to the implementation of the actor model on microexamples .


    At the end of the article I would like to appeal to readers. We have one more article in our plans in which we want to touch upon such an important problem of the mechanism of interaction based on asynchronous messages as agent overloading. And, at the same time, show how SObjectizer responds to errors in agents. But it would be interesting to know the opinion of the audience: what you liked, what you didn’t like, what I would like to know more about. This will greatly help us both in the preparation of the next article, and in the development of SObjectizer itself.




    * We emphasize that we are talking about our experience. Obviously, other teams that solve other problems using the actor model can successfully use a large number of actors in their applications. And there is exactly the opposite opinion on this.


    Also popular now: