SIP phone on STM32F7-Discovery


    Some time ago we wrote about how we managed to launch a SIP phone on STM32F4-Discovery with 1 MB of ROM and 192 KB of RAM) based on Embox . It must be said here that that version was minimal and connected two phones directly without a server and with voice transmission only in one direction. Therefore, we decided to launch a more comprehensive phone with a call through the server, voice transfer in both directions, but at the same time meet the smallest possible memory size.

    For the phone, it was decided to choose the application simple_pjsua as part of the PJSIP library. This is a minimal application that can register on the server, receive and answer calls. Below I will immediately give a description of how to run it on the STM32F7-Discovery.

    How to run

    1. Configuring Embox
      make confload-platform/pjsip/stm32f7cube
    2. In the conf / mods.config file we set the necessary SIP account.

      include platform.pjsip.cmd.simple_pjsua_imported(

      where server is the SIP server (for example,, username and password are the username and password of the account.
    3. Putting Embox with the make command . About the firmware we have on the wiki and in the article .
    4. Run the command “simple_pjsua_imported” in the Embox console

      00:00:12.870    pjsua_acc.c  ....SIP outbound status for acc 0 is not active
      00:00:12.884    pjsua_acc.c registration success, status=200 (Registration succes
      00:00:12.911    pjsua_acc.c  ....Keep-alive timer started for acc 0, destination:, interval:15s

    5. Finally, it remains to insert speakers or headphones into the audio output, and speak into two small MEMS microphones next to the display. We call from Linux through the application simple_pjsua, pjsua. Well, or you can any other type of linphone.

    All this is described on our wiki .

    How did we come to this

    So, initially there was a question about choosing a hardware platform. Since it was clear that the STM32F4-Discovery does not fit the memory, the STM32F7-Discovery was chosen. It has 1 MB of flash drive and 256 KB of RAM (+ 64 special fast memory, which we will also use). Also not a lot to call through the server, but decided to try to get in.

    Conditionally for themselves the task was divided into several stages:

    • Run PJSIP on QEMU. It was convenient for debugging, plus we already had support for the AC97 codec there.
    • Voice recording and playback on QEMU and STM32.
    • Porting the simple_pjsua application from PJSIP. It allows you to register on the SIP server and call.
    • Deploy your own Asterisk server and test it, then try external ones, such as

    The sound in Embox works through Portaudio, which is used in PISIP. The first problems appeared on QEMU - WAV played well on 44100 Hz, but on 8000 something obviously went wrong. It turned out that the matter was in setting the frequency - by default in the equipment it was 44100, and with us this was not changed by software.

    Here, probably, it is worth explaining a little how the sound plays at all. A sound card can be set to some pointer to a piece of memory from which you want to play or record at a predetermined frequency. After the buffer ends, an interrupt is generated, and execution continues with the next buffer. The fact is that these buffers need to have time to fill in advance, while the previous one is playing. We still will face this problem further on STM32F7.

    Next, we rented a server and deployed Asterisk on it. Since there was a lot to be debugged, and I didn’t want to talk into the microphone, I had to do automatic playback and recording. To do this, we patch simple_pjsua so that you can slip files instead of audio devices. In PJSIP, this is done quite simply, since they have the concept of a port, which can be either a device or a file. And these ports can be flexibly connected to other ports. You can see the code in our pjsip repository . As a result, the scheme was as follows. On the Asterisk server, I started two accounts - for Linux and for Embox. Next on Embox, the command simple_pjsua_imported is executed., Embox is registered on the server, then from Linux we call Embox. At the time of connection, we check on the Asterisk server that the entire connection is established, and after a while we should hear the sound from Linux in Embox, and in Linux we save the file that is being played from Embox.

    After it worked on QEMU, we switched to porting to STM32F7-Discovery. The first problem is that they did not fit in 1 MB ROM without the optimization of the compiler “-Os” on the size of the image. Therefore included “-Os”. Further, the patch turned off support for C ++, so it is needed only for pjsua, and we use simple_pjsua.

    Once simple_pjsua fit, decided that the chances of running it now are. But first it was necessary to deal with the recording and voice playback. The question is where to write? Chose an external memory - SDRAM (128 MB). You can try it yourself:

    Create a stereo WAV with a frequency of 16000 Hz and a duration of 10 seconds:

    record -r 16000 -c 2 -d 10000 -m C0000000

    We lose:

    play -m C0000000

    There were two problems. The first with the codec is WM8994, and it has such a concept as a slot, and these slots 4. So, by default, if this is not configured, then when playing audio, playback occurs in all four slots. Therefore, at a frequency of 16000 Hz, we received 8000 Hz, and for 8000 Hz, playback simply did not work. When only slots 0 and 2 were selected, it worked as it should. Another problem was the audio interface in the STM32Cube, in which the audio output works through the SAI (Serial Audio Interface) synchronously with the audio input (did not understand the details, but it turns out that they share a common clock and when the audio output is initialized, the audio is somehow tied to it entrance). That is, it is impossible to start them separately, so they did the following - they always work (including interrupts generated) audio input and audio output.

    Next, we faced the fact that the sound when recording voice was very quiet. This is due to the fact that the MEMS microphones on the STM32F7-Discovery somehow do not work well at frequencies below 16000 Hz. Therefore, we expose 16000 Hz, even if 8000 Hz comes. To do this, it was really necessary to add a software conversion of one frequency to another.

    Then I had to increase the size of the heap, which is located in RAM. According to our calculations, pjsip required about 190 Kb, and we only have about 100 Kb left. Here we had to use a little external memory - SDRAM (about 128 KB).

    After all these edits, I saw the first packages between Linux and Embox, and I heard a sound! But the sound was awful, not at all like at QEMU, nothing could be disassembled. Then we thought about what could be the matter. Debugging has shown that Embox simply does not have time to fill / unload audio buffers. While pjsip processes one frame, 2 interrupts occur when the buffer processing is completed, which is too much. The first thought to speed up was compiler optimization, but it was already included in PJSIP. The second is a hardware floating point, we told about it in the article. But as practice has shown, FPU did not give a significant increase in speed. The next step was prioritizing the threads. Embox has different scheduling strategies, and I turned on the one that supports the priorities, and set the audio to the highest possible priority. That didn't help either.

    The next idea was that we work with external memory and it would be good to move structures there that are accessed extremely often. I conducted a preliminary analysis of when and under what simple_pjsuaallocates memory. It turned out that out of 190 Kb, the first 90 Kb are allocated for internal needs of PJSIP and they are not used very often. Then, during an incoming call, the pjsua_call_answer function is called, in which then buffers are allocated to work with incoming and outgoing frames. It was about 100 kb more. And here we did the following. Before the call, the data is stored in the external memory. As soon as the call is made, we immediately replace the heap with another - into RAM. Thus, all the “hot” data was transferred to faster and more predictable memory.

    In the end, all this together allowed us to start simple_pjsua and call through its server. And then through other servers such as


    In the end, it turned out to start simple_pjsua with the transfer of voice in both directions through the server. The problem with the additional 128 KB of SDRAM can be solved by using a slightly more powerful Cortex-M7 (for example, STM32F769NI with 512 KB of RAM), but we still have not left hope to get into 256 KB :) We will be glad if someone is interested , and even better - try. All sources, as usual, are in our repository .

    Also popular now: