abondarev February 22, 2019 at 16:34

RTOS or not RTOS is the question

I was prompted to write this article by a long branch of comments (unfortunately I can’t call this a discussion) to my recent article “The diverse world of embedded systems and the place of Embox in it” . I was reproached in several places for confusing RTOS and Embedded OS, which I called LynxOS, QNX and VxWorks not RTOS, although in my opinion, of course, I did not. I suggested the author of these comments several times to write an article in which he would state his vision of the concept of “real-time operating system”, but for some reason he refused. Well, I will present my vision of this term, and let's discuss what RTOS can be called and what cannot. In the end, this question is often asked in relation to Embox .

The term OS RV (RTOS) refers to the field of marketing!

I thought to start in a scientific way, with the introduction of terms, but decided that this provocative thesis would be most welcome. So, why do I argue that the term RTOS (real-time operating systems) refers to marketing, more precisely, is a marketing (advertising) slogan? Everything is simple. When a product is produced, it needs to be sold. But if you try to sell just the operating system, difficulties will arise. This is where market positioning comes to the rescue. For example, you could say “we are faster than them.”

Interesting fact:

QNX was originally called Quick Unix (Qunix), "The OS was originally called Qunix," Quick UNIX ", until they received a polite letter from AT & T's lawyers asking that they change the name."

But you can go to court for it! And there you will need to provide evidence. But how to prove that you are faster in all cases? This is not a running competition. But of course, such a not entirely correct competition is slightly out of the normal market positioning.

Normal positioning involves the formation of a portrait of the consumer, identifying the properties of the product that are more in demand, and focusing on these properties. Well, or you form these needs at the user. For example, our processor has such a frequency, the smartphone has an N-core processor, and so on.

Since the classics of the genre have already formulated that operating systems are needed not only for user systems, but also for controlling various technological processes in automatic mode, and even introduced the concept of a real-time operating system, then, you know, it’s a sin not to use it. Introducing the concept of real-time systems, the classics spoke of a certain critical threshold of time. That is, it differs from conventional systems, where if the user waits for something, then he can wait, a hard real-time system, for example, controlling a turbine, can not wait, otherwise you can roll on to something bad.

Thus, you can tell the user that, unlike conventional operating systems, they will be provided with a system that guarantees the response time of the system. Naturally, the smaller it is, the greater the number of different technological processes the system can control. And a lot of tables appear where various key parameters of RTOS are given as a key parameter.

How are they received? This process is honestly described! Here we take such and such a model problem, such and such a hardware platform, and so apply the effect. Well, marketing, of course, can simplify and just give out, a reaction time of 1 μs. But we do not take this into account, we believe that everything is honestly described.

But excuse me, what if there is another task? 10 tasks, 100 tasks? And if a drunk programmer locks interrupts? And if in the system the programmer did not correctly prioritize the tasks?

There was a case when Embox passed tests for real time. We sat and thought how to prove that this is a real-time operating system. There is a laboratory, there is a customer who wants this to be so. We find out that for the customer real-time means the response time of the system is 1 μs. I ask if the following experiment will be evidence:

We take a certain hardware platform
Apply a signal to one of the GPIO inputs
Programmatically catch an event
At the output of the GPIO, we programmatically signal
Measurements are carried out using an oscilloscope, the reaction time will be the difference between the input and output fronts.

The customer confirms that this is exactly what is needed. I ask a clarifying question, and we are designing a model system and may not start (not load) it with other tasks. That is, it is normal that the system will only do such a simple test task. The customer said that this requires test tasks. Probably you yourself understand that the system has passed the tests! Naturally, with the confirmation of the characteristics, that is, the impact was repeated.

This section is by no means written in order to belittle any OS or any developers. But solely to show the whole incompleteness of the picture. I did not claim that the characteristics of some operating systems do not allow them to be attributed to the RTOS, but just this term is used by marketers. I saw other tests when I ordered the choice of the operating system of an independent laboratory based on the requirements of the task. There was a complex set of model tasks, and network interaction was considered, and how the parameters change if the system is loaded, and behavior in various emergency situations.

Definition of the term “real-time operating system”

Now I will introduce the term “Real-time operating system”. No, I won’t. The fact is that there are a lot of definitions of this term. Take at least the comments on the original article:

In real-time systems, a person is generally superfluous and, accordingly, the speed of a real-time system should be compared with the processes that it controls, whether it be an autonomous car or a process control system at a factory.

SRV / RTOS - this is solely a ranking on the predictability of response to critical events.

RTOS is such an OS in which the correctness of a task is characterized not only by logical correctness, but by the time it takes to complete this task.

Set the criterion for switching the context of any task to 1 μs per 100 MHz processor with a float-point coprocessor with a determination of 0.1 μs and everything will fall into place.
You will clearly see where RTOS and where not.

Well, I can’t ignore the opinion that I spoke about in an article that was voiced at one of the OSDAY conferences :

A system can be considered a hard real-time system if it has no places where, with locked interruptions, there are cycles with an unknown number of iterations.

But maybe it's all just particular, and as suggested in the comments , you just need to use the classics and not come up with bicycles. I will quote the specified classic (Andrew Tanenbaum, if someone did not guess):

“Another type of operating system is the real-time system. These systems are characterized by having time as a key parameter. For example, in industrial process control systems, real-time computers have to collect data about the production process and use it to control machines in the factory. Often there are hard deadlines that must be met. For example, if a car is moving down an assembly line, certain actions must take place at certain instants of time. If a welding robot welds too early or too late, the car will be ruined. If the action absolutely must occur at a certain moment (or within a certain range), we have a hard real-time system. Many of these are found in industrial process control, avionics, military, and similar application areas. These systems must provide absolute guarantees that a certain action will occur by a certain time.

Another kind of real-time system is a soft real-time system, in which missing an occasional deadline, while not desirable, is acceptable and does not cause any permanent damage. Digital audio or multimedia systems fall in this category. Digital telephones are also soft real-time systems.

Since meeting strict deadlines is crucial in real-time systems, sometimes the operating system is simply a library linked in with the application programs, with everything tightly coupled and no protection between parts of the system. An example of this type of real-time system is e-Cos.

The categories of handhelds, embedded systems, and real-time systems overlap significantly. Nearly all of them have at least some soft real-time aspects. The embedded and real-time systems run only software put in by the system designers; users cannot add their own software, which makes protection easier. The handhelds and embedded systems are intended for consumers, whereas real-time systems are more for industrial usage. Nevertheless, they have a certain amount in common. ”

But from this description it follows only that systems can be used in systems where the absence of a reaction within a given period can lead to disastrous consequences. Well, in order to achieve a key parameter (not exceeding the reaction time), the OS can be a library, an example of eCos.

About soft and hard real-time

I deliberately did not notice the division into soft and hard, since any modern universal OS can be considered a soft real-time system, well, for example, windows plays multimedia files perfectly. And I understand that here it was more about all kinds of DSPs, that is, signal processing. But if we also consider this part, then we will never finish it at all. In general, hereinafter we mean only systems where it is impossible to violate the time limit, that is, hard real-time.

How to achieve real-time characteristics

I could not give a strict definition (if someone is ready to give, write in the comments). But in all of the above definitions, a couple of properties are visible (this time and predictability). If you translate time into the predictability option (the weight of the arc when moving from one state to another), then only predictability remains!

Let's think about how to achieve this.

It will be obvious to remove all unnecessary from the critical system. A universal system is unlikely to be stable. Even Comrade Tanenbaum talked about this, I mean, when he talked about eCos.

Another approach that increases the predictability of the system, again, proposed by Tanenbaum, is the use of special (simple) algorithms for resource planning, primarily processor time, that is, special task schedulers. He suggested several approaches to planning, but I would like to focus on the static table-driven table first.

The developer must ensure that all tasks succeed in completing their time slice. To do this, it is proposed to statically analyze the critical task and determine its threshold values. This approach is laid down in the ARINC-653 standard. The standard for on-board systems, and you yourself understand, if something suddenly does not have time to work on the plane, then a catastrophe can happen.

The next approach is a static schedule, but based on priorities. That is, the developer must again analyze all situations and, having assigned all the tasks in the system priorities, ensure that critical tasks are completed at a given time.

I don’t want to continue, because there is an original! It is written, of course, better than I can do it, and besides, they can again be accused of distorting the facts. I cited precisely these approaches in order to show that in any case, the developer of the final system has the responsibility to ensure the characteristics of the system. And the operating system should only provide the appropriate capabilities.

Continuing the discussion on methods to increase predictability, I want to give another comment

“You can achieve real time on a raspberry, but not with RTOS, but with a small state machine breaking into its cache.”

Here I want to pay attention to the following points:

increased predictability (real-time properties) due to the exclusion of any RTOS from the system
representation of a program by a state machine
Well, the dependence of real-time systems not only on the properties of software, but also on hardware.

With a decrease in the amount of unpredictability in the case of a decrease in the number of lines of code, I think everyone agrees. Although, as always, I do not agree, but more on that later.

What is the influence of the hardware is also most likely not in doubt. In particular, when it was said that there were no loops with an arbitrary number of iterations in the state of locked interrupts, it sounded that on some cortex-m in the described RTOS there was no disconnection of interrupts at all. This is a little cunning, because there the interrupt controller disables interrupts with equal or lower priority, independently, but the fact of influence is obvious. And of course, the presence of cache, address translation (or rather misses on pages), contributes to the uncertainty. Especially, I wanted to draw attention to the fact that, in fact, no one can guarantee one hundred percent correct operability of the equipment. Well, the postings fell off from you, how will the presence of an RTOS help to avoid a catastrophic outcome of events?

Representation of the program as a state machine, I would like to propose to consider it from an unobvious side. Namely, that a predictability program can be analyzed. And since we are talking about all conditions, then it should be analyzed, and statically, for all possible situations. Well, since functional programming languages are much better suited to static analysis, it is possible to develop a program in some special language, or add the use of special programming languages. The first approach is used, for example, in the verified seL4 kernel . An example of the second approach is the same ARINC-653 standard , with its mandatory formation of requirements in XML.

There are other methods that increase the predictability or, if you like, factors affecting the predictability of the system. I made a report on this topic at one of the OSDay conferences. In particular, in addition to those already listed, I have highlighted an architectural approach. After all, it is well known that, for example, microkernel architecture can increase the predictability of the system. But even in the same report, a somewhat unobvious, organizational approach was highlighted. This is just about the point where I did not agree that the lack of RTOS leads to increased predictability. If you think about it, then in general the concept of an operating system has significantly reduced the number of errors due to code reuse. That is, if you do not have code that really fits into one switch / case, then it is better to use ready-made modules. After all, the parameter “number of errors per 1000 lines of code” has not been canceled, and no matter how debugged your new code is, there are errors.

Does RTOS exist at all?

Having settled on the statement in the previous section that there are errors in any code, I want to make one more provocative thesis. Does RTOS exist at all?

Let's figure it out. Discussing with a friend about real-time systems, we agreed to the extent that a real-time operating system (we are talking about hard real-time systems) can hardly exist. He proposed to represent the entire system as a state machine or graph with a description of the maximum transition time from one state to another. Moreover, the system can be considered stable if it is proved that for all input and internal states, there is an arc leading to a given state with a time limit. Well, you understand, this is possible only for a very small system, just the very machine of states mentioned in the commentary, but in the modern world few people need such a system.

But we have no doubt that real-time systems exist. And of course, RTOS too. If this were not so, ~~then the first flying woodpecker would destroy civilization,~~ then there would be no avionics, astronautics, robotics, ACS-TP and much more.

How to get out of the situation. It is very simple, although in general the problem is most likely unsolvable, but for a specific problem, it is possible to introduce restrictions that make it solvable, with some kind of meaningful error probability.

For example, standards are introduced: realtime POSIX , ARINC-653 , ITRON. These standards, in fact, distinguish a class of tasks that can be solved if you adhere to this standard. Or studies are carried out by independent laboratories that study whether the properties of a particular OS are suitable for solving the target problem.

So Embox RTOS or not RTOS?

In my opinion, to answer a similar question, both for Embox and for any other OS, you can only ask: “What do you mean?”. More precisely: “What do you mean by the concept of real time?”. That is, if the interrupt processing time is of interest, and whether it is possible to call the interrupt handler directly, this is one thing, if you need to increase the reliability of work, albeit slowly, but it is certainly much less likely to fail, this is another, compliance with any standard is the third, Verification is the fourth. It is no coincidence that the great classic Andrei Tanenbaum, although he proposed methods to increase predictability, used the very concept of a real-time system, but refrained from any strict definitions.

PS At the time of this writing, not one RTOS has been affected.

Tags: