erlyvideo August 4, 2017 at 18:01

Rusty IP Camera: Rust Firmware

Before the Mirai botnet, only those who were especially interested knew about what was inside ordinary IP cameras. In most cases, there is regular Linux, often with a default root password, or even without it at all: we have such a camera in our office, with firmware from December 2016 and a passwordless root telnet.

But what next, what software is running on this Linux? There are some cool datacompboy articles about finding a bug that isn’t there , there is still scattered information, but in general the situation is this: the IP camera has a specially patched kernel that gives the program access to the hardware that produces compressed video frames through a special library.

The sad reality is that very often this software is written far from the best way. Suffice it to say that most cameras that hang on the street are very affected because of the large distance to the server, because the authors of their firmware have mastered the skill of data loss over TCP.

We decided to fix this situation with our firmware, and bet on Rust.

Working conditions

There are a couple of things to do: sort out the SDK, write code that sets up the hardware, take H264 frames and send them to the network. A couple of trifles, especially considering how easy it is to deploy on an IP camera and debug it all. Well, the remaining trifle: we decided to write this code in Rust.

Rust as an experiment was chosen for its amazing property: compile time guarantee the integrity of memory along with the lack of runtime. This means that we can expect the possibility of controlling the allocation of memory, which is very important, given the tightness of resources.

Why doesn't Go, Erlang or some Java / C # work? Because on the IP camera a flash drive of 8 megabytes and 128 megabytes of memory of which half is taken from the kernel for the needs of the video. It is clear that there are different cameras, but they always try to do the minimum so as not to raise the cost unnecessarily. On one camera we saw a flash drive with 64 megabytes, of course you can turn around there, but quite tiny flash drives are enough.

So, we see the usual picture on a cheap camera for 3000 rubles:

# free
             total       used       free     shared    buffers     cached
Mem:         60128      17376      42752          0       2708       4416
-/+ buffers/cache:      10252      49876
Swap:            0          0          0
# cat /proc/cpuinfo 
Processor	: ARM926EJ-S rev 5 (v5l)
BogoMIPS	: 218.72
Features	: swp half thumb fastmult edsp java 
CPU implementer	: 0x41
CPU architecture: 5TEJ
CPU variant	: 0x0
CPU part	: 0x926
CPU revision	: 5
Hardware	: hi3518
Revision	: 0000
Serial		: 0000000000000000

In such conditions, lousy written software starts to suffer very much already from 3-4 connections. The golden rule when working with IP cameras: in general, try not to do more than one connection (or two, one for each quality) and this is due not only to the narrow channel to the camera, but also because the fourth client to the IP camera often makes it impossible viewing in the first three. Looking ahead, I’ll say that we and 50 clients have no problems.

How is the camera

Before moving on, I’ll talk a little bit about the camera device that we are working with at the current stage.

An SPI flash drive is soldered to the camera. This is the same flash drive as the one on which some locker is flashing itself into the BIOS. The contents of this SPI flash drive can be read, picked up by ticks, can be written (if you're lucky), the processor reads data from it to the memory and executes it. It happens that the flash drive is not SPI, but NAND, then everything is more complicated: just like that you can’t catch the ticks - you have to be more responsible.

At the very beginning of the flash drive is uboot. This bootloader is used in almost all embedded devices: not just cameras, but routers and phones. Those. most likely, it can be argued that there are more copies in the world than there are copies of Windows.

Uboot has open sources, but it stores data specific to a particular piece of hardware. If you copy a USB flash drive from a camera made by XM to a camera made by Hikvision, then there is a big chance that even uboot will not boot.

Those. already at this stage there is an exciting process of maintaining a register of famous cameras, their accounting, which is very cool facilitated by the amazing ability of our neighbors to send exactly what you ordered. As a good example, we can cite a recent story from our clients (the largest national operator of the country) who signed a contract for 3 years to supply cameras of a specific model and characteristics, after which a week later cameras of a different model and with completely different characteristics arrived.

But it’s okay, all this is a resolved issue, we move on.

And then there is the Linux kernel. It would be too simple if it were possible to assemble one core for all possible cameras and then just plug the modules. No, it’s not possible, therefore, for different versions of the chipset, different kernels are needed: somewhere 2.xy, somewhere 3.xy Why so? Because closed modules go to the kernel. Somewhere you can contrive, but all the same it will not work to unify everything.

After that comes the usual household buildroot. Everything is like people here.

Next, you need to run tricky scripts that configure the hardware via i2c (and possibly somehow), load the correct modules and start specially written software.

Video capture

There is a lot of iron training in video capture. If you read the onvif specification and the manual on the IP camera SDK, you can see a lot in common - the software interface reflects the general structure of most hardware and it is as follows: the video is taken from the sensor, processed a little, then it is pushed into the encoders (hardware of course) and then you can pick up the software ready-made H264 NAL units from a specific place in memory. For the basic scenario, it remains only to attach user management, settings and some kind of network protocol. For a full-fledged camera, you still need support for all sorts of mass customization mechanisms (discovery, onvif, psia, etc ..) and analytics.

And what about Rust

That's just the streamer we have rusty. A whole bunch of unsafe code, autogenerated from the SDK code using bindgen, patched binding to libc (we will try to fill the patch in upstream) and then implement RTSP on tokio. Even there is already the opportunity to watch video from the camera in a normal browser - this is an unattainable luxury for Chinese cameras, which without fail require ActiveX installation.

The structure is very unusual after erlang: after all, there are no processes and messages, there are channels, and with them everything becomes a little different. As I wrote above, a modernly written code with the correct organization makes it possible to distribute video not to 2-3 clients, but more than 50 without any performance drawdown.

An important point: during the development, not a single segfault has happened yet. While there is a persistent feeling that Rust makes you write in the way that, in principle, good gray-haired people who have seen any bad thing write. So for now, I like everything.

During August, there are plans to complete the work according to the baseline scenario, so there is a question for the audience that goes on in the survey. Well, ask questions that arose.

Only registered users can participate in the survey. Please come in.

what is more interesting to read on

44.4% what can an IP camera do and how does the SDK work? 123
84.1% how is this embedded in Rust done? 233

Tags: