Non-blocking TCP server without using undocumented features


    In a remarkable article with trapexit «Building a Non-blocking TCP server using OTP principles» explains how to build a non-blocking TCP server using the principles of the OTP . I think everyone who started learning elrlang sooner or later came across this article. To build a non-blocking TCP server, the above article uses undocumented functionality from the prim_inet module.

    I will not philosophize to use the undocumented features for good or bad, in some “crutches” decisions it is really necessary, in production I would prefer to use proven means. Note, even in the article itself, the author warns: "Examining prim_inet module reveals an interesting fact that the actual call to inet driver to accept a client socket is asynchronous. While this is a non-documented property, which means that the OTP team is free to change this implementation, we will exploit this functionality in the construction of our server [1].

    By a non-blocking server we mean that the listening process and FSM should not make any blocking calls and respond quickly to incoming messages (for example, configuration changes, restart, etc.) without causing timeouts [2].

    Regarding the clipping above: problems may arise (with the listening process), if it carries additional functional load (for example, contains any additional user APIs that need to be "pulled" during operation), FSM does not have any architectural considerations at all must contain blocking calls. Therefore, if the listener's only function is to listen, then there is nothing to worry about its flow being blocked by waiting for a connection, if it is necessary to restart this system element, it will be forcibly stopped by the supervisor at a predetermined timeout and then restarted (if not correct rights). Problems can arise when the code is updated hot (the author did not check which rake in this case could be encountered, who tried to share the experience).

    We set the task to implement a non-blocking TCP server only with documented methods.

    Server structure

    The first thing that comes to mind about the task is to implement the expectation of the connection in a separate process. Thus, the server structure can be represented as follows.

    Figure 1

    1. application_master: main_loop / 2
    2. application_master: loop_it / 4

    When the application starts, the application_master process is created, in the logical structure it is one process, but at the physical level two processes are created. Application master is the leader of the group of all processes in the application.

    3. The supervisor of our TCP server (supervisor)
    4. The listener (gen_server) that hits the listener process (simple process)
    5. The supervisor of client processes (supervisor)
    6. The listener process (simple process)
    7. Client processes (gen_fsm)


    I think it makes no sense to provide the source code for all parts of the system, we will focus only on the tcp_listener module and the process that it launches.

    -export([init/1, handle_call/3, handle_cast/2, handle_info/2,
    		 terminate/2, code_change/3]).
    -define(SERVER, ?MODULE). 
    1. -define(LOGIC_MODULE, tcp_fsm).
    2. -record(state, {
    		  listener,       %% Listening socket
    		  module          %% FSM handling module
    start_link(Port) ->
    	gen_server:start_link({local, ?SERVER}, ?MODULE, [Port], []).
    init([Port]) ->
    	Options = [{packet, raw}, {active, once}, {reuseaddr, true}],
    	case gen_tcp:listen(Port, Options) of
    	{ok, LSocket} ->
    	   %% Create first accepting process
    3.   spawn_link(?MODULE, accept_func, [LSocket]),
    	   {ok, #state{listener = LSocket, module   = ?LOGIC_MODULE}};
    	{error, Reason} ->
    	   error_logger:error_msg("Error: ~p~n", [Reason]), {stop, Reason}
    handle_call(_Request, _From, State) ->
    	Reply = ok,
    	{reply, Reply, State}.
    handle_cast(_Msg, State) ->
    	{noreply, State}.
    handle_info(_Info, State) ->
    	{noreply, State}.
    terminate(_Reason, #state{listener = LSocket} = _State) ->
    code_change(_OldVsn, State, _Extra) ->
    	{ok, State}.
    accept_func(LSocket) ->
    4. 	{ok, Socket} = gen_tcp:accept(LSocket),
     	error_logger:info_msg("Accept connection: ~p.\n", [Socket]),
    5. 	{ok, Pid} = tcp_client_sup:start_child(),
    6. 	ok = gen_tcp:controlling_process(Socket, Pid),	
    7. 	tcp_fsm:set_socket(Pid, Socket),
    8. 	accept_func(LSocket).

    1. A macro announcing a client connection processing module.
    2. Structure for storing the state of the gen-server.
    3. We generate an additional process that will “listen”.
    4. We are waiting for the connection.
    5. Create gen_fsm (tcp_fsm module) to handle the connection with the client.
    6. Change the process controlling the socket to the newly created process in
    step 5. 7. Transfer the socket to the tcp_fsm module.
    8. We begin to “listen” again.


    (emacs@host)2> make:all(). # компилируем
    Recompile: tcp_server_sup
    Recompile: tcp_listener
    Recompile: tcp_fsm
    Recompile: tcp_client_sup
    Recompile: erltcps
    (emacs@host)3> code:add_path("../ebin"). # добавляем путь к ebin
    (emacs@host)4> application:load(erltcps). # загружаем приложение
    (emacs@host)5> application:start(erltcps). # запускаем приложение
    =INFO REPORT==== 22-Jun-2011::13:10:07 ===
    Accept connection: #Port<0.2353>. # есть коннект
    =INFO REPORT==== 22-Jun-2011::13:10:07 ===
    IP: {127,0,0,1} # IP адрес
    =INFO REPORT==== 22-Jun-2011::13:10:15 ===
    <<"hello\r\n">> # получено сообщение
    =INFO REPORT==== 22-Jun-2011::13:10:23 ===
    {127,0,0,1} Client disconnected. # клиент отключился


    As a result, we built the TCP server framework “as if” not blocking. In our implementation, a special process remains blocked, the only function of which is to wait for the connection and create a process for processing it. You can add additional logic to the tcp_listener module itself (for example, start / stop receiving connections by stopping the listening process).
    • We did not use undocumented features, which in production can cost us a lot.
    • The process specially created for this remains blocked.

    • In our OTP application, there is a process created not according to the principles of OTP.
    • If the listening process crashes (accept_func / 1 in the tcp_listener module), the signal propagates, and tcp_listener also crashes, because the supervisor restarts tcp_listener, and it in turn re-creates the listening process from the accept_func / 1 function.

    These two minuses are interconnected. For everyone there is a decision. Here are a couple of tasks for readers:
    1. What needs to be done so that tcp_lictener does not crash if the listening process crashes (accept_func / 1)?
    2. What needs to be added, for safer use of simple processes in an OTP application?


    The source code for the article can be downloaded on github .

    What to read?

    1. Building a Non-blocking TCP server using OTP principles
    2. Creating a non-blocking TCP server using OTP principles
    3. Erlang questions mailing list ~ prim_inet
    4. Excellent documentation
    5. Erlang applications

    Also popular now: