Elvish March 25, 2013 at 09:03

Splitting a continuous data stream into structural units

From the sandbox

Quite often, a situation arises when it is necessary to transfer data blocks over a continuous stream. In this case, the question of how to separate one block of data from another comes to the fore. The question comes to the background whether to transmit data in binary form or in text. Add to this the ability to continue working with small distortions (loss, garbage, errors of interacting nodes) and the need for efficient utilization of the data transmission channel. In this case, the task should be solved on a simple microcontroller with limited resources.

Such tasks arise, for example, in telemetry transmission and for controlling remote equipment. On the one hand, there is usually a simple microcontroller, on the other hand, there is a computer. Communication between them can be carried out on the good old RS232. Although it can be more difficult, for example, the output of the UART microcontroller is converted to 802.11b, then the radio signal propagates to the radio mast and Ethernet comes to the server.

If my bike is interesting on this subject, welcome to cat.

First of all, we will determine the requirements:

A channel can be created at any time.
Both the controller and the computer can be connected to the channel at any time, including in the middle of data transfers.
If the channel is distorted, the corresponding data block should be discarded.
One channel can have several devices.
Data blocks can contain any sequence of bytes and be of arbitrary length.
The resources allocated to support the protocol are severely limited.

It turns out an implementation remotely resembling UDP.

Consider several common methods for the example of the transfer of three numbers of "stress", as one block and two other numbers, "temperature", as another block.

Often there is a solution to this problem by translating all the data to be sent in text form, and by separating the blocks (packets) with a line feed.

It may look like this:

V 1231, 2400, -231
V 1333, 2100, -232
T 36, -40
Used sprintf (buf, “V% d,% d”, ...).

Transmission issue: sprintf uses a noticeable amount of stack, long enough.

Reception problem: on the controller, converting the “-232” character set to int requires additional resources. No type control, 100,500 error checking conditions.

As pluses - a person can see the transmitted parameters with the naked eye.

If the project continues to develop on this protocol, then after some time it will be impossible to maintain it and even the availability of human analysis will disappear.

It is possible to partially solve the problem by transmitting not decimal digits, but hexadecimal digits - this will simplify the processing, but will not save other problems.

To improve control, you can wrap the transmitted data in XML:

1231 2400 -231

Or you can wrap it in JSON:

{
	“Voltages”: [1231, 2400, -231]
}

The protocol will become documented. But at the same time, there is still no type control at the compilation stage. And the amount of additionally transmitted data becomes excessively large. At the same time, the problem of parsing numbers and text on limited resources remains.

If one of the nodes continuously transmits something, then the moment of connecting the second node may fall in the middle of the transmitted data. In most cases, it is not possible to reliably determine the start of a package on a freshly turned on node, so you will either have to wait for a line feed or rely on luck. In addition, there is no packet control for correctness (for example, due to a close thunderstorm, one bit was transmitted incorrectly). These shortcomings can be solved using an approach close to the NMEA protocol:

$ GNVLT, 1231,2400, -231 * 71

Here the start of the package is always $, the end of the package is an asterisk and CRC (0x71) of the package data. This solves the problem of incorrect package (the classic CRC here is very simple - XOR), but there are still problems of documenting and controlling the type of package.
It turns out that when using a text stream, there is a lot of overhead, a little type control, and there are difficulties in documenting.

Consider binary data transfer. Byte sets will be transmitted, therefore, it is necessary to determine the structures:

typedef struct {
	int16_t supplyIn;
	int16_t supplyOut;
	int16_t groundPotential;
}PackVoltages;
typedef struct {
	int8_t internal;
	int8_t outside;
}PackTemperature;

We have 6 bytes for voltages and 2 bytes for temperatures.

You can combine these structures into one, add another 5 bytes there for further protocol expansion, enter the type of transmitted data in the structure:

typedef struct{
	char type;
	union {
		PackVoltages voltages;
		PackTemperature temperatures;
	}Data;
	char rezerv[5];
}FullPack;

We get a packet of 16 bytes (due to not obvious alignment), and not 12 as it might seem (in our example, this is not a problem, but you need to be careful when aligning). You can enable compiler options for dense byte packing, but another problem may arise - some processors (on the ARM core) cannot read unaligned data and it is not so easy for beginner Jedi to find this error.

Further 16 bytes are always transferred to the channel. The receiving side waits for the next 16 bytes and processes the next packet. It is easy to see that there is no CRC. Add CRC:

typedef struct{
	char type;
	union {
		PackVoltages voltages;
		PackTemperature temperatures;
	}Data;
	char rezerv[5];
	char CRC;
}FullPack;

The packet size remains 16 bytes. When sending data, we need to put down the type of package, calculate and put in the CRC package, then we send the received package.

The advantages of this approach are that when processing packages, type control appears at compile time. There are no unnecessary conversions between number - text and vice versa. The received portion of data is always the same size - it is convenient to allocate memory in advance.
There are many drawbacks: there are unnecessary overhead costs for data transmission from small values to tens of times with significantly different lengths of the transmitted data. This synchronization method is well suited for client-server interaction over a TCP channel - nothing will be lost along the way, the beginning of the packet is always known. In a situation with connection to the channel later than initialization, a situation is possible when the first few bytes have not reached. Then all received packets will be shifted by this number of bytes and the data in them will naturally be incorrect. It is good if the CRC drops them, but there will be no connection with the node. And with a probability of 1/256, omissions of “broken” packets are possible. You can try to solve this problem by transmitting a signature byte of the "beginning" of the packet, but given that we transmit binary data, the same byte can also be found in the data itself. Therefore, it is not always possible to reliably determine the start of a packet. Another issue is variable alignment. One byte is required per packet header, 32-bit numbers can often be found in the data itself, which will lead to periodic data shifts of 0-2 bytes. An annoying annoyance is that CRC must be calculated “manually” when sending different types of packets.

Another option is similar to the previous one, to reduce the overhead of transferring bytes at the beginning of the packet, its actual length is transmitted. The problem with this approach is that the packet must be completely counted in advance (i.e. fit in memory) and only then transmitted. This can be a difficult task on the limited resources of the microcontroller, especially for large packages. In addition, until the library receives the entire package, the transfer cannot begin, which can adversely affect the bandwidth and latency of the channel. Other disadvantages and advantages are similar to the previous option.

And the last method under consideration, it is also used in my library. The historical name is the bin protocol or the BIN protocol.
When sending binary data, they can be separated by a specially allocated byte. At the same time, if this byte is found in the data, replace it with a different sequence of bytes. When receiving, do the reverse procedure. This method is called “byte stuffing” (thanks for the name Flexz ).
CRC counting also entrust to the transmission of the packet.

For different conversions to work, three bytes must be reserved. It is best to choose them so that they are less common. Terms of replacement bytes:
<Sharing> = <Sharing> <Sharing>
<final> = <Sharing> <Optional>
<Optional> = <More>

This shows that the <final>
A packet can be formed in this way:
< Separator > Protocol_ data <Final>
If you want increased reliability, you can create a packet in this way:
<Final> <Separator> Protocol_ data <Separator> <Final>

The second form increases the percentage of rejection of damaged small packets by about 15%, when this adds overhead by the same 10-15%, so it will not be considered further.

Thus, when receiving packets, even if we are connected at an arbitrary point in time, it is enough to wait for the “ Sharing” symbol to start receiving the packet. And only after receiving the <Final> byte it is necessary to check the correctness of the packet and send it for processing.

Now you can see what “protocol_data" consists of:

Header: 1 byte - packet type, 1 byte destination address
Data: the data itself processed according to the above
CRC rules : 1 byte is also processed according to the above rules
That is, you can send an arbitrary number of bytes, and they will be wrapped with packet start and end markers, packet type, destination address and CRC will be added to them.

For our case, it will look like a definition of data structures:

typedef struct {
    int16_t supplyIn;
    int16_t supplyOut;
    int16_t groundPotential;
}PackVoltages;
typedef struct {
    int8_t internal;
    int8_t outside;
}PackTemperature;

And the knowledge that the first structure will be denoted by the symbol 'V', and the second 'T'. The transfer of these parameters will be carried out through a function with 3 parameters - this is the type of packet, the address of the beginning of the transmitted data and the length of the transmitted data.
BP_SendMyPack ('T', & packTemperature, sizeof (PackTemperature));

And the transmission channel will have this sequence:

On small packets, a rather large overhead is obtained, but with an increase in the size of the packet it becomes invisible.
Gray indicates the actual information being transmitted, white indicates the necessary minimum of supporting information, and yellow indicates the overhead of my bin protocol.

The advantages of this approach are that there is little overhead, arbitrary connection is not a problem, a good level of data abstraction - you can use the same functions to send and receive on different devices, programs, protocols. You can achieve good type control at the compilation stage. The data itself can be aligned in the program in the correct way, and this will not add extra bytes when sending packets.

Disadvantages: the type of packet and the address of the packet cannot be equal to special characters and can take values from 1 to 254. The CRC byte is only one and, as a result, there is a 1/256 probability of missing a broken packet.

When passing parameters in binary form between different architectures, it is necessary to take into account the byte order. In the event of a difference, conversion functions must be used that replace the byte order with the reverse.

As a working illustration of the protocol, a small program on QT is attached. At startup, the program opens the TCP socket and starts itself again with the parameters for connecting to this socket. That is, two almost identical instances of the program are created, interconnected via a TCP socket. If necessary, you can start with the necessary keys to separately run the server or client of the program.

Available keys:
-dedicated - creates a server, displays connection parameters to the console.
-child - connects to the specified attributes:
-A: 11.22.33.44 - ip connection address (default localhost)
-P: 12345- Keyless connection port
- starts the server and client, and connects them.
Program - translates user actions through a binary protocol to a socket, and also listens to the reverse channel and performs actions based on the received data.
In the program, any mouse button on a black background draws an expanding circle until the button is released. Clicking on the top rainbow strip changes the current color. It’s a lot of fun drawing with a kid from two different computers :)

Explanations on working with the binary protocol.
In the MainWindow class, all protocol and TCP connection interactions are collected (instead of TCP, you could use anything else).

The constructor MainWindow :: MainWindow calls the private function initBinProtocol (); Which initializes the protocol. At the same place, the address of the function for "issuing bytes" globalSendCharToExternal () is passed to the protocol. Then, the handler of signal arrival of characters in the TCP socket is installed on ReadFromParent (), which ultimately character by character transfers all received bytes to the protocol handler.

After connecting TCP, another ReadFromChild () handler is hung on the connected session, which likewise transfers all received bytes to the protocol handler.

The PackTypes.h file contains all types of transmitted packets. This is actually a protocol description. Type TPackAllTypes was introduced for ease of processing on a computer; it is not necessary to use this type on a microcontroller.

The PaintBox class contains the protocol itself. Packets from other instances of the program are checked every 50ms by timer. If desired, you can do the processing to receive the last byte of the whole packet.

Events are sent to the protocol at the moments when the mouse buttons are pressed and released, and also when the “clear” button is pressed through the BP_SendMyPack () function . First, a structure with binary parameter values is filled, then it is transmitted. To send a cleaning command - no data is required and all that is transmitted is the command byte.

The PaintBox :: timerCheckPacks () function periodically checks for the available commands in the protocol buffer ( BP_IsPackReceived () ) and executes them.

In the Types.h filecontains definitions of similar basic types for cross-compilation by different compilers for different platforms, it may be necessary to edit it in your case.

In general, the code is documented, so it’s not difficult to figure it out.

Links to the github with the library:

git: git@github.com: Elvish / microkern.git
http: https://github.com/Elvish/microkern

PS. And how do you split streams into pieces? If you have questions, I will try to answer. If the topic interests me, in the next article I can issue a simplified shell with auto-addition for the simplest controllers (an indispensable thing for debugging almost any small device).

UPD
Another crushing method from Flexz :
A method used, for example, in the Modbus-RTU protocol . Packets are separated by intervals of "silence", the line is stored in an inactive state for the time necessary to transmit several characters. In this case, it is not necessary to process each byte to parse byte stuffing, and you can use the DMA receiver, if it is, of course, available.

From the author: In my opinion, time division is possible on lightly loaded lines or at very accurate hours. In a situation where physical data transfer is difficult to monitor (for example, relaying via WiFi). Packages can be combined with each other or split up arbitrarily. Even some USB-RS232 adapters sin this - they combine packets of 8 bytes and as a result, not all the pieces of iron work through the adapters.

The presence of two different bytes of the beginning and the end of the packet is associated with increased protocol stability with strong losses in the channel. Initially, the protocol was developed when the transmission was conducted by ancient radio modems on mobile equipment in conditions of very poor visibility. With one byte, the protocol often made a mistake when receiving the beginning of one packet (then transmission failure) and the end of another packet.

Tags:

Splitting a continuous data stream into structural units

Also popular now: