Protection from eavesdropping on calls - building a secure SIP telephony with your own hands

  • Tutorial
Hello, Habr!
This time I want to talk about encryption technologies for VoIP calls, about what protection different approaches provide and how to organize the most secure voice connection from listening with technological security guarantees.
In the article I will try to explain the features of technologies such as SIP \ TLS, SRTP and ZRTP. And I will demonstrate specific patterns of use on the example of our service

Bit of theory

Any VoIP call consists of 2 main components: exchange of signaling information and transmission between users of media streams with voice and / or video.
At the first stage, in the process of signaling information exchange, clients directly or through a server agree among themselves on the parameters of the call being established. If the connection is established using the server, based on the signaling information, the server authorizes the client, establishes who and whom is calling, conducts routing and switching. Thanks to the data of the signaling protocol, the clients and the server agree on the encryption method used by the media codecs, exchange ip addresses and port numbers where media is expected to be received, etc. This happens over protocols such as SIP, XMPP and others.
Directly "conversation", that is, the exchange of voice data between clients, as a rule, occurs via the RTP protocol. The data is transmitted internally in the form agreed upon by the clients and server at the “signal” stage. Voice exchange is possible both directly between clients, and through an intermediary server. In the second case, the server can help clients with the passage of NAT and in the selection of codecs.

So, what is an encrypted VoIP call? Further we will talk about the SIP protocol as the most popular.
As we already found out, a call consists of a signal and media parts, each of which can be encrypted separately using special protocol methods. SIP \ TLS is used to encrypt signaling information, ZRTP and SRTP protocols are used to encrypt “voice”.

SIP \ TLS- roughly speaking, an analog of HTTPS for regular SIP. The protocol allows the client to verify that it is communicating with the correct server, provided that the client trusts the certificate provided by the server. You can read more on wikipedia

SRTP and ZRTP - these are two different ways to encrypt RTP streams. The fundamental difference between them is that the key exchange for SRTP occurs in signaling (at the first signal stage of call setup). And for ZRTP, right at the beginning of the exchange of RTP packets (in the second, “media” part), according to a special protocol based on the Diffie-Hellman cryptography method.
It is important that for SRTP, a mandatory condition for reliable call encryption is the simultaneous use of SIP \ TLS + SRTP, otherwise it will not be difficult for an attacker to obtain keys (which will be transmitted over an unencrypted SIP) and listen to the conversation. While this is not important for ZRTP, the RTP stream will be securely encrypted regardless of whether the signaling is encrypted or not. Moreover, the protocol can determine the presence of “man in the middle” (including service servers!) Between directly speaking clients. This allows you to be sure that it is impossible to listen to the conversation, at least from the point of view of listening to the network / data transmission medium.

Connection scheme for SIP clients with various encryption settings: Call Diagram

The following schemes for setting up an encrypted call can be distinguished:

  1. Both users use SIP \ TLS and SRTP. In this case, the key exchange for media encryption takes place over a secure signaling protocol. It assumes trust in the server involved in the connection. Outsiders cannot gain access to either signaling information or voice data. The disadvantage is that the user is not notified at the protocol (client) level and is not convinced that the second user also uses an encrypted connection to the server.

  2. Both users use ZRTP, while voice passes through the server. In this case, the server is defined by the ZRTP protocol as Trusted MitM (man in the middle). Key exchange takes place according to the algorithm based on the Diffie-Hellman method (which guarantees the impossibility of wiretapping) using the RTP protocol. If secure SIP \ TLS is used in this case, outsiders also cannot gain access to either signaling information or to the “voice”. As in the first version, trust to the switching server is assumed, but unlike it, reliable encryption of voice does not require the use of secure SIP \ TLS. Also, unlike the first option, each user sees that the conversation is encrypted to the server from both sides, as well as that both are connected to the same (trusted) server.

  3. Both users use ZRTP, but media is installed directly between clients. Since the key exchange takes place directly between the clients, even the server that made the switch cannot listen to the conversation. In this case, both clients display information that a secure direct session has been established. You can verify this by checking SAS (short authorization lines) - they will be the same. If you want to hide signaling information from outsiders, use SIP \ TLS. This is the safest option, but in this case the server will not be able to perform many functions that are performed on it in other situations, for example, recording a conversation directly, transcoding voice for clients with different settings of audio codecs, etc.

  4. One user uses the first method described above, and the other uses the second. In this case, trust in the server is also required. Signal information is encrypted using SIP \ TLS. For a user with ZRTP, the protocol will report that an encrypted connection has been established to the server (End at MitM). Whether encryption is used on the other hand at the protocol level will not be known.

This is where we finish with theory and move on to practice! Set up your own SIP server, create SIP users, install SIP clients and learn how to make encrypted calls using the free cloud telephony service

Server Tuning

setting up domains
First you need to create your own server. To do this, go to service website , go through simple registration and enter the settings interface.

First of all, let's go to the " Internal network -> Domains " section and create our own domain so as not to be limited in the choice of SIP user names. You can park your domain or create a personal subdomain in one of the service areas.
Further it is necessary in the section " Internal network -> Sip Users"create SIP users and configure some parameters of their clients. SIP user names can be arbitrary, but since it is more convenient to type numbers on softphones and hardware phones, we will start identifiers like and the like. I got 1000, 1001 , 1002, 1003. After creating the SIP identifier, you must remember to click the “Save.” Button. If there are no unfilled forms in the settings interface, the system will not swear and will display a log of changes with the status “Done.”

Next, you need to configure the codecs used and encryption methods. To do this, click the gear icon to the left of the SIP identifier. I plan to use the SIP client (CSipSimple) on the smartphone and want to use the ZRTP encryption method in " basic"in the settings tab, select the G729 and SILK codecs, and in the" protection "tab the ZRTP method.

SIP settings of user
You can choose other parameters. It is only important to note that the settings for the SIP account in the service interface must match the settings in the SIP client. This is necessary to ensure correct communication between clients with different codec and encryption settings, and don’t forget to save the created configuration.

In general, this is enough to configure the simplest configuration. You can configure SIP clients and call between them by dialing their numbers 1000, 1001, 1002, 1003. If you wish, you can add a common SIP gateway for calls to the telephone network and configure the appropriate call routing. But, in this case, this is already a slightly different scheme for using the service, which requires rather a different kind of security measure than encrypting traffic to the gateway.

Let's move on to setting up SIP clients.

As I said, I plan to use CSipSimple on android smartphones. First of all, you need to install the client using the standard Play Market, or download it on the manufacturer’s website , which by the way opens the source of your client, which in some cases can be almost sacred. You need to install the client itself and additionally codecs. I have installed “CSipSimple”, “Codec Pack for CSipSimple” and “G729 codec for CSipSimple”. The latter is paid and it is not necessary to use it, free SILK and OPUS provide decent call quality over 3G networks.

Run CSipSimple and go to the configuration interface. Select the “Basic” wizard and configure using data from the web interface. It should look like this:
Next, in the general CSipSimple settings in the "Media -> Audio codecs "you need to select your preferred codecs. For calls over 3G I recommend using SILK, OPUS, iLBC, G729. Since the settings in the server interface and in the client interface should match , and on the server I chose SILK and G729, then in the list of audio codecs I only check CSipSimple in front of these codecs and remove the rest.
In the client section " Network -> Secure Protocol " you need to select the desired encryption settings. I turn on only ZRTP. I leave the rest off. If you want, you can use SIP \ TLS - you need to consider that the server expect It has TLS connections on port 443. This is done specifically for too smart mobile operators that block standard VoIP ports.
It should also be borne in mind that SRTP and ZRTP are not always compatible and it is highly desirable to select only one of them in the client.

Making calls using ZRTP

After all the settings are completed, we will make a few calls in order to demonstrate how CSipSimple works in calls between users with different security settings.

Immediately after the execution of the instruction, the SIP call of user 1001 to user 1000 will look like this.
CSipSimple shows that a MitM server is involved in the call, to which both clients are connected. Parameter EC25 means that the Diffie-Hellman protocol is used on elliptical curves with a parameter of 256 bits. AES-256 is a symmetric encryption algorithm that is used. ZRTP - Verifyed status means that the SAS control string has been verified by the user.

Change the media transfer mode in ppbbxx settings for both clients. Setting direct media = yes allows you to transmit voice directly. In this case, the parties see the same SAS strings, using the Twofish-256 symmetric encryption algorithm. Using ZRTP in this mode requires much more compatibility from clients and is less reliable from the point of view of establishing a connection, since the server is not involved in data transfer. Be sure to use the same audio codecs on all clients and the correct operation of NAT.

If the SIP user 1001 is not encrypted, while 1000 uses ZRTP, then the second client will show that encrypted voice transmission occurs only to the server (End at MitM).


Communication fully protected from listening can be organized. This is not difficult to do. The most suitable way to do this is to use the SIP IP telephony protocol and the ZRTP media encryption method. The service allows you to put into practice various communication-protected communication schemes, including without the ability to decrypt conversations on the switch. Sip Client CSipSimple is an open source project and has a sufficient set of functions to use it as a secure client.

Also popular now: