
Calls in Odnoklassniki iPhone Application
On Friday, we launched a new version of our iPhone application . In this post we would like to share with you the experience of developing similar services.
“Calls” service - video chat on the Odnoklassniki website is implemented using Flash. But not all of our users access Odnoklassniki from a computer / laptop. To expand the video chat audience, we decided to support it also in smartphones.
This article will discuss the experience of implementing video calls in an application for iOS.
Scheme of the application:

SinceC / C ++ code can be compiled for iPhone , there were no problems with choosing codecs - you can take any open source codec , provided that the license allows use in commercial applications. We took Speex for the sound, and for the H.263 video from the Android OS code (I had to modify it a bit for compatibility with the flash).
Starting with version 4.0, iOS has an API for capturing raw video (AVCaptureSession). But this API has its limitations - in particular, you can not set an arbitrary resolution (several fixed presets are supported). It is also impossible to change the orientation of the video when you rotate the device. In order to get around these restrictions, we capture video frames of a larger size than we need, and then we cut and rotate the video in accordance with the orientation of the device. This consumes quite a lot of processor resources and has an undesirable zoom effect (“zoom”), but it remains only to say thanks to Apple for such an implementation of the API.
The video is captured in the native codecs format YUV. In this regard, there arises the problem of "rendering" frames for displaying "your" picture on the screen. We also decided it, but more on that later, in the section “Displaying video on the screen”.
Another worth mentioning is the camera selection algorithm: if the device has a front camera, the application selects it. If there is no front camera, video is captured from the camera on the back cover. If there is no camera at all (for example, iPad), the “Enable Video” button is not displayed.
The video decoder, as well as the video capture module from the camera, output frames in the YUV format. In order to display a frame on the device’s screen, you need to convert it to RGB format and bring to the desired size. Since the main processor is busy at this time with other tasks, we transferred these operations to the GPU using OpenGL ES 2.0. The frame in YUV format is loaded into the OpenGL texture, after which the fragment shader recounts the color components for each point. To support older devices that do not support OpenGL ES 2.0 (younger than 3GS), the application has a code branch that performs all calculations without using shaders. On such devices, it is necessary to compensate for the lack of processing power by lowering the video frame rate.
To capture sound from the microphone and play through the speaker, the AudioSession API is used - this is the lowest level API that can be accessed from an iOS application. The sound that comes from the server is decoded and falls into the buffer, which smooths out the “jitter” of the network. The depth of the buffer varies depending on the quality of the connection - the worse the network, the deeper the buffer. For the user, this is manifested in the fact that on a bad network, the delay in sound increases.
The iPhone has several routes for playing sound - it can be played through the telephone speaker, through the speakerphone (speaker on the bottom of the device) or through a connected headset. If a headset is not connected, the speakerphone route is selected by default. The telephone speaker is connected when the proximity sensor is triggered.
Starting with version4, iOS added multitasking support. A video chat call can continue even when our application is running in the background. In order to free up resources to the maximum when switching to the background, the application stops receiving and sending videos, and also does not update the graphical interface until the application becomes active again. At the same time, the sound is captured and reproduced, and the red “bar” with the inscription “odnoklassniki” is displayed on the screen above - an indication that the application is “listening” to the microphone. When you click on this panel, the application returns to the active state.
A proximity sensor is a sensor mounted on top of the front of the phone that determines the proximity ofan object to the device. The idea is to determine when the user puts the phone in his ear. During a video call, the application monitors the status of this sensor. In the event of a sensor, the application extinguishes the screen, turns off the reception and sending of videos, and switches the sound to the telephone speaker. This is convenient for those users who just want to "talk in the old way" by phone.
Adapting to connection quality
Video calls in the application can be used both on WIFI and on 3G / EDGE networks. During a call, the status of the network connection is constantly monitored and in case of deterioration in the quality of communication, the frame rate and video quality are reduced.
We will be glad to hear your comments and recommendations on improving the service.
The development team of the Calls service.
“Calls” service - video chat on the Odnoklassniki website is implemented using Flash. But not all of our users access Odnoklassniki from a computer / laptop. To expand the video chat audience, we decided to support it also in smartphones.
This article will discuss the experience of implementing video calls in an application for iOS.
Scheme of the application:

Audio / Video Encoders and Decoders
Since
Video capture module
Starting with version 4.0, iOS has an API for capturing raw video (AVCaptureSession). But this API has its limitations - in particular, you can not set an arbitrary resolution (several fixed presets are supported). It is also impossible to change the orientation of the video when you rotate the device. In order to get around these restrictions, we capture video frames of a larger size than we need, and then we cut and rotate the video in accordance with the orientation of the device. This consumes quite a lot of processor resources and has an undesirable zoom effect (“zoom”), but it remains only to say thanks to Apple for such an implementation of the API.
The video is captured in the native codecs format YUV. In this regard, there arises the problem of "rendering" frames for displaying "your" picture on the screen. We also decided it, but more on that later, in the section “Displaying video on the screen”.
Another worth mentioning is the camera selection algorithm: if the device has a front camera, the application selects it. If there is no front camera, video is captured from the camera on the back cover. If there is no camera at all (for example, iPad), the “Enable Video” button is not displayed.
Display video on screen
The video decoder, as well as the video capture module from the camera, output frames in the YUV format. In order to display a frame on the device’s screen, you need to convert it to RGB format and bring to the desired size. Since the main processor is busy at this time with other tasks, we transferred these operations to the GPU using OpenGL ES 2.0. The frame in YUV format is loaded into the OpenGL texture, after which the fragment shader recounts the color components for each point. To support older devices that do not support OpenGL ES 2.0 (younger than 3GS), the application has a code branch that performs all calculations without using shaders. On such devices, it is necessary to compensate for the lack of processing power by lowering the video frame rate.
Capture and play sound
To capture sound from the microphone and play through the speaker, the AudioSession API is used - this is the lowest level API that can be accessed from an iOS application. The sound that comes from the server is decoded and falls into the buffer, which smooths out the “jitter” of the network. The depth of the buffer varies depending on the quality of the connection - the worse the network, the deeper the buffer. For the user, this is manifested in the fact that on a bad network, the delay in sound increases.
The iPhone has several routes for playing sound - it can be played through the telephone speaker, through the speakerphone (speaker on the bottom of the device) or through a connected headset. If a headset is not connected, the speakerphone route is selected by default. The telephone speaker is connected when the proximity sensor is triggered.
Multitasking
Starting with version
Using the proximity sensor
A proximity sensor is a sensor mounted on top of the front of the phone that determines the proximity of
Adapting to connection quality
Video calls in the application can be used both on WIFI and on 3G / EDGE networks. During a call, the status of the network connection is constantly monitored and in case of deterioration in the quality of communication, the frame rate and video quality are reduced.
What I would like to improve
- Make it possible to use the rest of the classmates application during a conversation
- Automatic reconnection in the event of a break (for example, when switching between
Wi-Fi / 3G networks ) - Make more extensive use of code optimizations for the ARM processor to reduce processor load and battery drain during a call.
We will be glad to hear your comments and recommendations on improving the service.
The development team of the Calls service.