SIP customer interaction. Part 2



    In a previous article, we looked at the simple interaction of SIP clients without using a proxy server. This interaction is extremely rare in practice, but is great for understanding the basics of SIP.

    Now that we have figured out the basic things, I propose moving on to the real work of the protocol.

    In this article, I plan to consider three questions:
    1. Choice of transport protocol and proxy search;
    2. Work through proxy;
    3. Registration on the proxy server.

    Transport Protocol Selection and Proxy Search


    Since SIP supports several transport protocols (UDP, TCP, SCTP, TLS), you must somehow determine which protocol to use. There are several ways to do this.

    The first method involves explicitly specifying the transport in the SIP URI (except for TLS). It looks like this:



    If the transport is not explicitly specified, the following algorithm applies:
    1. If the SIP URI contains an IP address, then UDP is used for the SIP URI, and TCP is used for the SIPS (Secure SIP).
    2. If the IP address is not specified, but the port is specified, then UDP is used for SIP URI, TCP is used for SIPS.
    3. If there is no IP address and port, but the corresponding NAPTR record is present in DNS, then “SIP + D2U” corresponds to UDP, “SIP-D2T” indicates TCP and “SIP-D2S” indicates SCTP. NAPTR contains a link to the SRV record that will be used to search for the proxy server. If NAPTR is not present, then a request to search for an SRV record must be fulfilled.
    4. The result of the SRV request will be the name and port of the proxy server.
    5. If there is no SRV record, then an A or AAAA request is executed. At the same time, UDP is used for SIP URI, TCP for SIPS.

    In order to better understand, consider an example when we want to contact the sip client: ivan@domain.ru:



    So, we found out the parameters of Ivan's proxy server. Now I propose to consider the use of Proxy as part of the SIP dialogue.

    Remark for those who do not know what NAPTR is. I found out that there is this type of DNS record only when I wrote this article, so do not despair. A little more about NAPTR here .

    Interaction Using Proxy


    Why do we need SIP Proxy? As I said, in the example from the 1st part of the article, the clients knew each other's IP addresses and could communicate directly. In real life, clients most often receive addresses dynamically, so it makes no sense to "remember" this or that IP. The first thing that comes to mind in this situation is to use DNS A-records and determine the real valid address. However, the following problem lies here: the IP address identifies a specific device, not the user on it. A feature of SIP interaction is that message exchange occurs not at the device-device level, but at the user-user basis. At the same time, one user can simultaneously use several SIP clients: on a mobile phone, on a work computer, on a home computer and on a SIP phone. How to be?

    SIP protocol offers the following solution: SIP Proxy is created and each user registers their devices on this Proxy (more precisely, users register on the registration server, and Proxy has access to the registration database, but for simplicity we assume that this is the same server). How this is done, I will show below. For now, just remember that Proxy knows exactly how to find one or another client of the user.

    When Peter calls Ivan, the following sequence of actions is performed:
    1. Peter's SIP client determines the address and protocol of Ivan's SIP Proxy (how to do this - see above)
    2. Client sends INVITE request to proxy
    3. The proxy server looks at which devices Ivan has registered and sends a request to all these devices
    4. Ivan answers the call on one of the devices and sends 200 OK to Proxy
    5. Proxy redirects 200 OK to Peter
    6. Peter receives Ivan's SIP address on a specific device from the Contact field in 200 OK and sends the answer directly, bypassing Prxoy
    7. All subsequent communication is also direct.

    On the diagram, it looks as follows:



    For those who have studied the first part of the article, everything looks pretty familiar; only an intermediate proxy server was added. Accordingly, messaging has changed slightly.

    Before we break into a detailed examination, a small remark. Within the SIP, two types of URIs are shared . The first one is the user URI, also known as address of recorf (AOR). A request sent to this address involves searching the Proxy database and sending the request to one or several devices. The second is the device URI (or rather, the user on the device). The device URI is usually called a contact and is contained, respectively, in the Contact field of the SIP message. AOR is contained in the From and To fields.



    Start of conversation

    So, Peter sends INVITE for Ivan to the Proxy server: the


    Proxy server redirects the request to all SIP clients of Ivan. For simplicity, suppose that Ivan uses only one device. In order for the SIP client to understand that the request was redirected through Proxy, the server adds its header field via:



    Ivan's SIP client sends a 180 Ringing response (Ivan hears a call). At the same time, he adds tag in the To field and indicates his contact in the Contact field. In addition, the received parameter was added in the first via field. This parameter shows which address the client of Ivan received the request from (i.e., the address of the proxy server, as Ivan sees it). It can be useful to know this to solve problems:



    Proxy, accordingly, redirects the request to Peter's client. At the same time, he removes his via:



    After sending 180 Ringing, as soon as Ivan picks up the phone, Ivan's client sends a 200 OK reply to Prxoy:



    Proxy sends this answer to Peter, removing the via:



    Now the most interesting part. Peter's client sends an ASK message directly to Ivan's client, bypassing Proxy. Moreover, if Ivan simultaneously used several SIP clients, the answer came exactly to the one that was needed. What makes this possible?

    200 OK leaves from the client on which Ivan picked up the phone. Moreover, the Contact field of the 200 OK response contains a URI corresponding to the user Ivan on a specific device. Thus, Peter's client sends the ACK to this device exactly, after which the participation of Proxy is no longer required:



    All other messages, including media traffic, bypass Proxy.

    End of conversation

    At the end of the conversation, Ivan’s client sends BYE directly to Peter’s client:



    Peter sends back a confirmation:


    Everything is here, as in the first part of the article.

    So, we examined the interaction of SIP clients with the participation of a Proxy server. There was only one question left: how did Proxy find out the addresses of Ivan's clients? Using the registration procedure. How this happens, I will tell below.

    SIP registration


    Registration is as follows:



    Let's take a closer look at each of the messages. Ivan sends a Register request to the server (for simplicity, we believe that the role of the registration server is installed on proxy.domain.ru). The most important thing about this request is the Contact field. This is Ivan’s address on a specific device:



    In response, the server sends 401 Unauthorized, that is, an authorization request. The most important field in the answer is WWW-Authenticate. It's not hard to guess that realm is a domain, and algorithm indicates which hash algorithm we will use. Of interest is the nonce field:



    Nonce is short for "number used once." Nonce is a one-time random sequence that Ivan's client will combine with a password string, then generate an MD5 hash from the received string and put the result in a new request in the WWW-Authenticate field (in fact, everything is somewhat more complicated, but for simplicity we will assume that that's it). To do this, use the response parameter.

    Why do you need nonce? If the client generated MD5 from the password and did not use nonce, then the hash would be the same each time. An attacker could intercept such a hash and use it for authorization. This would be as unsafe as transmitting the password in clear text.

    If you use nonce, MD5 is taken every time from a new line and it turns out different. Therefore, even intercepting the hash, the attacker most likely will not be able to use it for authorization.

    By the way, note that the new registration request has CSeq one more: the



    Server also combines nonce with Ivan’s password and receives an MD5 hash. After that, he compares his hash with the hash received from Ivan. If they match, then the server sends 200 OK. Notice that the expires parameter has been added to the Contact field. In this case, the registration will be stored in the server database for 3600 seconds or one hour:



    If Ivan wants to renew the registration, then he must send another REGISTER within this hour.

    What if Ivan uses several devices with SIP support at once? Everything is very simple - you need to send a registration request from each of them.

    After the corresponding entry appears in the database of this registration server, the Proxy server will be able to redirect requests to Ivan's SIP clients.

    Bonus for those who are interested


    You may have noticed that, in response to a registration request, the server sends a response containing a To-tag:



    It is clear that when setting the dialog, this tag helps to avoid receiving the same message again. There is a rule for this: if the message does not contain a To-tag and the UAS has already received a message with the same CSeq, From-tag and Call-ID, then the message is discarded. Why do we need a To-tag if we do not establish a dialogue with the registration server. The best answer I could find is that in RFC 3261 it says that the 200 OK response coming to a request without a To-tag should contain a To-tag. That is, it is not necessary for anything, but it is accepted.

    I hope that the operation of the SIP protocol, after reading the article, has become more understandable for you. I will be glad to your comments.

    Also popular now: