Ogoun March 19, 2013 at 14:51

"Address Already in Use" or how to avoid problems when ending a TCP connection

Transfer
Tutorial

Correct shutdown

To correctly complete the network connection, both parties should send packets with a completion signal (FIN), which indicate that the parties will no longer send data, and each side must acknowledge (ACK) the receipt of a signal to complete the network data exchange. FIN is triggered when an application calls the close (), shutdown (), or exit () method. After the close () method completes, the kernel goes into standby mode for confirmation from the second side of the completion signal. This makes it possible that the process that initiated the shutdown will be completed before the kernel releases the resources associated with the connection, and again allows the port to be used to communicate with another process (in this case, when we try to use the port, we will get an AddressAlreadyInUse exception).

On the image:

There is an established connection, status ESTABLISHED
The client initiates the end of the connection, sends a signal to the server to complete the connection (FIN), switches to the waiting state for the server to respond (FIN_WAIT_1)
The server receives a signal about the completion of the connection and sends a confirmation (ACK), enters the state of waiting for the completion of the connection (CLOSE_WAIT) (calls close ())
The server sends a signal to the client that it successfully closed the connection (FIN) and tries to read the client confirmation (ACK), after which it disconnects without waiting.
Now the client can receive two signals in different

ACK sequences - the client received a confirmation that the server understood its intention to close the connection
- The client enters the idle state of the end of connection closure (FIN) signal from the server (FIN_WAIT_2)
- The client receives a signal about the server closing the connection (FIN), sends a confirmation (ACK), waits for a while (TIME_WAIT), and disconnects (the kernel releases resources) (CLOSED)
FIN - the client receives a signal about closing the connection on the server side (FIN), earlier than the confirmation from the server (ACK), about receiving an initiating signal about closing from the client (FIN)
1. The client sends an acknowledgment of the signal that the server is closing the connection and enters the shutdown state (CLOSING)
2. After disconnecting, it tries to read the confirmation signal from the server (which was sent by the server immediately after receiving the completion signal from the client, point 2), waits for a while (TIME_WAIT), and the kernel releases the resources (CLOSING).

The figure shows all the possible states that may be during the correct completion, depending on the order in which the FIN and ACK packets are received from the remote side. Please note that if you initiated the completion of the connection (the left half of the figure), then the other side will not wait for confirmation of your receipt of the FIN package (right half of the figure). TIME_WAIT is required in case the acknowledgment (ACK) that you sent was not received on the other side, or in case of false packets for some reason. I do not know why the TIME_WAIT state was not made on the server side, although if the client initiates a closure, this is unconditional and should not require waiting. The TIME_WAIT state can hold the port for several minutes after the process completes. Retention time varies by operating system,

If both parties have time to initiate the completion signal before they receive it from the other side, then both sides will be forced to go through the wait (TIME_WAIT).

Correct disconnection of the listening side

The listening socket can be closed immediately, in the absence of incoming connections, its state goes immediately to CLOSED. If there are incoming connections, it will go to FIN_WAIT_1 and then to TIME_WAIT.

Please note that on the listening socket side, it is not possible to guarantee a clean close. While you test the use of a connection by the select () method before closing, there is a tiny but real possibility of an incoming connection appearing after calling select () and before calling close ().

Unexpected shutdown of the remote side

If the server suddenly shuts down, the local side initiates the closure of the connection, in which case TIME_WAIT is inevitable. If the remote side disappears due to a network failure or a reboot of the machine (rare cases), the local port will remain bound until the TIME_WAIT state timeout expires. Worse, some older operating systems do not implement a timeout for the FIN_WAIT_2 state, and can remain in it indefinitely, in which case only a system reboot can save.

If the local application (client) crashes during an active connection, the port will be busy until the TIME_WAIT state ends, the same is true for applications closed during the connection to the remote side (pending).

Ways to Avoid Problems

Option SO_REUSEADDR

You can use the setsockopt () method to set the SO_REUSEADDR option, which allows you to create a binding to a port even if it is still in TIME_WAIT state (only one process will be allowed to bind to a port). This is the easiest and most effective way to avoid the message “address already in use”.

But, oddly enough, using the SO_REUSEADDR option can lead to more hard-to-catch errors than "address already in use". SO_REUSEADDR allows you to use a port stuck in TIME_WAIT, but you can still use this port in the process in which it is bound initially.

What?

Suppose I use local port 1010 and connect to port 300 of foobar.com server, then the client disconnects and the port goes into TIME_WAIT state, and I can use this port (1010) in any connection except for connecting to foobar.com on port 300.
Situation in which this may cause a problem it may be this: my program tries to find a reserved local port (<1024) for binding, to connect to a service that requires a reserved port, and if I use the SO_REUSEADDR option, then every time I start the program on my machine I will receive the same reserved port, even if it hangs in the TIME_WAIT, and can get «Address already in use», in the place where the port was used for the last time. In this case, you must refuse to use the SO_REUSEADDR option.

Some do not like to use SO_REUSEADDR, because This option has security issues. On some operating systems, this option may allow different processes to use the same port at the same time. And this is a problem because most servers bind to a port without using a specific address, instead they use INADDR_ANY (the netstat command will display them as * .8080). Thus, if the server communicates with the address * .8080, then another process, from another user of the local machine, can connect to the address local_machine.8080 (and its intentions may not be good at all), and intercept all your connections, because he indicated a more specific address. This problem only appears on multi-user systems that do not have account restrictions,

Others do not like the fact that the core of the system spends its resources on hundreds or even thousands of TIME_WAIT states, this problem can also be avoided by using the approach described below.

Client disconnects first

Looking at the figure above, we see that TIME_WAIT states can be avoided when the closure is initiated on the remote side, which means that problems can be avoided if the server allows the client to initiate the shutdown first. To do this, you can build the architecture of the user protocol in such a way that the client knows when it needs to initiate closure. The server can make a safe shutdown by receiving an EOF command from the client, however, we still have to set a timeout to wait for the client to shut down so that it can shut down correctly. It is almost always enough to wait a few seconds until the connection to the server is correctly completed.

This concept probably makes sense to call “the remote side disconnects first”, otherwise we will depend on what we call the client and what the server is. If you are developing a system consisting of several client programs that are located on the same machine and access different servers, then you will want to transfer the responsibility for disconnecting to the servers in order to save the resources of the client machine.

For example, I wrote a script that uses remote shell (rsh) to communicate with all the machines on my network, and it does work in parallel, constantly using several open connections. Less than 1024 ports are available for rsh. At first I used the “rsh -n” command, which causes the local side to shut down in the first place. After several tests, all available ports are less than 1024, were in TIME_WAIT state, and the process stopped. Removing the -n option initiates a disconnect on the remote side, and the TIME_WAIT problem is resolved, however, this may cause rsh to hang while waiting for an incoming connection. And if you close the incoming connection locally, the port will again be in TIME_WAIT state. Ultimately, I just gave up using rsh and wrote my implementation in perl (current version can be downloaded here )

Timeout reduction

If, for some reason, none of the above options suits you, there is an opportunity to reduce the timeout of the TIME_WAIT state. The ability and implementation of such an operation depends on the operating system you are using. It is worth remembering that a too short timeout can have negative consequences, in particular when packets are lost or in congested networks.

Tags: