Processes and threads in-depth. Overview of various streaming models

    Hello dear readers. In this article we will consider various stream models that are implemented in modern operating systems (preemptive, cooperative threads). We also briefly review how threads and synchronization tools are implemented in the Win32 API and Posix Threads. Although scripting languages ​​are more popular on Habré, however, the basics - everyone should know;)


    Threads, processes, contexts ...


    System call (syscall). You will find this concept quite often in this article, however, despite all the power of sound, its definition is quite simple :) A system call is the process of calling a kernel function from a user application. Kernel mode - code that runs in the zero ring of processor protection (ring0) with maximum privileges. User mode- the code executed in the third ring of processor protection (ring3) has reduced privileges. If the code in ring3 uses one of the forbidden instructions (for example, rdmsr / wrmsr, in / out, an attempt to read the register cr3, cr4, etc.), a hardware exception will be triggered and the user process whose code was executed by the processor in most cases will be interrupted . A system call transfers from kernel mode to user mode by calling the syscall / sysenter, int2eh instructions in Win2k, int80h on Linux, etc.

    So what is a stream? Flow(thread) is, the essence of the operating system, the process of executing a set of instructions on a processor, more precisely, program code. The general purpose of threads is the parallel execution on a processor of two or more different tasks. As you might guess, threads were the first step towards a multi-tasking OS. The OS scheduler, guided by the priority of the thread, distributes time slices between different threads and puts the threads to execute.

    Along with the flow, there is also such an entity as a process. Process(process) is nothing more than an abstraction that encapsulates all the resources of a process (open files, files mapped to memory ...) and their descriptors, streams, etc. Each process has at least one thread. Each process also has its own virtual address space and execution context, and threads of one process share the process address space.

    Each thread, like every process, has its own context. Context is a structure in which the following elements are stored:
    • CPU registers.
    • Pointer to a thread / process stack.

    It should also be noted that if a system calls a thread and switches from user mode to kernel mode, the thread stack changes to the kernel stack. When switching the execution of a thread of one process to a thread of another, the OS updates some processor registers that are responsible for virtual memory mechanisms (for example, CR3), since different processes have different virtual address spaces. Here, I specifically do not touch on aspects regarding the kernel mode, since such things are specific to a particular OS.

    In general, the following recommendations apply:
    • If your task requires intensive parallelization, use threads from one process, instead of several processes. This is because process context switching is much slower than thread context switching.
    • When using a thread, try not to abuse the synchronization tools that require kernel system calls (such as mutexes). Switching to kernel mode is an expensive operation!
    • If you write code executable in ring0 (for example, a driver), try to do without using additional threads, as changing the context of the stream is an expensive operation.

    Fiber is a lightweight thread that runs in user mode. The fiber will require significantly less resources, and in some cases it can minimize the number of system calls and consequently increase productivity. Typically, fibers are executed in the context of the thread that created them and require only the preservation of processor registers when they switch. Somehow, but the fibers did not find proper popularity. They were implemented at one time in a variety of BSD OSs, but were eventually thrown out of there. The Win32 API also implements the fiber mechanism, but it is used only to facilitate porting software written for another OS. It should be noted that either the process level scheduler is responsible for the fiber switching, or the switching must be implemented in the application itself, in other words, manually :)

    Stream classification


    Since the classification of flows is a controversial issue, I propose to classify them in the following way:
    • By mapping to the core: 1: 1, N: M, N: 1
    • According to the multitasking model: preemptive multitasking, cooperative multitasking.
    • By implementation level: kernel mode, user mode, hybrid implementation.


    Classification of threads by mapping to kernel mode



    As I already mentioned, threads can be created not only in kernel mode, but also in user mode. There may be several thread schedulers in the OS:
    • The central scheduler of the kernel mode OS, which distributes the time between any thread in the system.
    • Thread library scheduler. A library of user mode threads can have its own scheduler, which distributes time between threads of various user mode processes.
    • Process thread scheduler. The fibers already examined by us are put to execution in this way. For example, every Mac OS X process written using the Carbon library has its own Thread Manager.

    So. Model 1: 1 is the simplest model. According to its principles, any thread created in any process is controlled directly by the OS kernel scheduler. Those. we have a 1 to 1 mapping of the user process thread to the kernel thread. Such a model has been implemented in Linux since the 2.6 kernel, as well as Windows.

    Model N: M maps a number of threads of user processes N to M threads of kernel mode. Simply put, we have some kind of hybrid system when some of the threads are put to execution in the OS scheduler, and most of them in the process thread scheduler or thread library. An example is the GNU Portable Threads. This model is quite difficult to implement, but has more performance, since a significant number of system calls can be avoided.

    Model N: 1 . As you probably guessed, many threads of the user process are mapped onto one thread of the OS kernel. For example fiber.

    Multitask thread classification


    In the days of DOS, when single-tasking OSs ceased to satisfy consumers, programmers and architects decided to implement a multi-tasking OS. The simplest solution was as follows: take the total number of threads, determine some minimum execution interval for one thread, and take and divide the execution time equally between all siblings. So the concept of cooperative multitasking appeared(cooperative multitasking), i.e. all threads are executed in turn, with equal execution time. No other thread can supersede the current running thread. This very simple and obvious approach found its application in all versions of Mac OS up to Mac OS X, also in Windows prior to Windows 95, and Windows NT. So far, cooperative multitasking is used in Win32 to run 16-bit applications. Also for compatibility, cooperative multitasking is used by the flow manager in Carbon applications for Mac OS X.

    However, cooperative multitasking over time has shown its failure. The volumes of data stored on hard drives grew, and the speed of data transfer in networks also increased. It became clear that some threads should have a higher priority, such as device interrupt service flows, processing synchronous IO operations, etc. At this time, each thread and process in the system acquired such a property as priority. You can read more about thread and process priorities in the Win32 API in Jeffrey Richter’s book, we won’t stop there;) Thus, a thread with a higher priority can crowd out a thread with a lower priority. Such a principle formed the basis of preemptive multitasking.(preemptive multitasking). Now all modern operating systems use this approach, with the exception of the implementation of fibers in user mode.

    Classification of threads by level of implementation


    As we have already discussed, the implementation of the thread scheduler can be carried out at different levels. So:
    1. Implementation of threads at the kernel level . Simply put, this is a classic 1: 1 model. This category includes:
      • Win32 streams.
      • Linux Posix Threads Implementation - Native Posix Threads Library (NPTL). The fact is that prior to the 2.6 kernel version, pthreads on Linux was completely and completely implemented in user mode (LinuxThreads). LinuxThreads implemented the 1: 1 model as follows: when creating a new thread, the library made a system call to clone, and created a new process, which nevertheless shared a single address space with the parent. This caused a lot of problems, for example, threads had different process identifiers, which contradicted some aspects of the Posix standard that concern the scheduler, signals, synchronization primitives. Also, the thread crowding model worked in many cases with errors, so it was decided to put pthread support on the shoulders of the kernel. Two developments were carried out in this direction by IBM and Red Hat. However, the implementation of IBM did not gain due popularity, and was not included in any of the distributions, because IBM suspended further development and support of the library (NGPT). Later NPTLs entered the glibc library.
      • Lightweight Kernel Threads (LWKT), for example in DragonFlyBSD. The difference between these flows and other flows of the nuclear regime is that lightweight nuclear flows can displace other nuclear flows. DragonFlyBSD has many kernel threads, such as a hardware interrupt service thread, a software interrupt service thread, etc. All of them work with a fixed priority, and so LWKT can preempt these flows (preempt). Of course, these are more specific things that you can talk about endlessly, but I will give two more examples. On Windows, all kernel threads are executed either in the context of the thread that initiated the system call / IO operation, or in the context of the thread of the system process system. Mac OS X has an even more interesting system. In the kernel there is only the concept of task, i.e. tasks. All kernel operations are performed in the context of kernel_task. The processing of a hardware interrupt, for example, occurs in the context of the driver thread that services this interrupt.
    2. Implementation of threads in user mode . Since system call and context change are quite difficult operations, the idea to implement thread support in user mode has been in the air for a long time. Many attempts have been made, but this technique has not gained popularity:
      • GNU Portable Threads is a user-mode implementation of Posix Threads. The main advantage is the high portability of this library, in other words, it can be easily ported to other operating systems. The problem of thread crowding out in this library was solved very simply - the threads in it are not crowded out :) Well, of course, there can be no talk of any multiprocessing. This library implements the N: 1 model.
      • Carbon Threads, which I mentioned more than once, and RealBasic Threads.
    3. Hybrid implementation . An attempt to use all the advantages of the first and second approach, but as a rule, such mutants have much greater disadvantages than advantages. One example is the implementation of Posix Threads in NetBSD using the N: M model, which was later replaced by a 1: 1 system. For more information, see Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism.


    Win32 API Threads


    If you are still not tired, I offer a small overview of the API for working with threads and synchronization tools in the win32 API. If you are already familiar with the material, you can safely skip this section;)

    Threads in Win32 are created using the CreateThread function, where a pointer to a function is passed (let's call it a thread function), which will be executed in the created thread. A stream is considered complete when the stream function is executed. If you want to ensure that the thread is completed, you can use the TerminateThread function, but do not abuse it! This function "kills" the thread, and does not always do it correctly. The ExitThread function will be called implicitly when the thread function finishes, or you can call this function yourself. Its main task is to free the thread stack and its handle, i.e. kernel structures that serve this thread.

    A thread in Win32 may be in a suspend state. You can “lull the thread” by calling the SuspendThread function, and “wake” it by calling the ResumeThread call, you can also put the thread to sleep state when it is created by setting the CreateSread function CreatedSuspended parameter value. Do not be surprised if you do not see similar functionality in cross-platform libraries, such as boost :: threads and QT. Everything is very simple, pthreads just do not support this functionality.

    There are two types of synchronization tools in Win32: implemented at the user level, and at the kernel level. First - this is critical sections ( critical section called ), to include a second set of mutex ( the mutex ), events ( event ) and semaphore ( semaphore)

    Critical sections are a lightweight synchronization mechanism that works at the user process level and does not use heavy system calls. It is based on the mechanism of mutual locks or spin locks ( spin lock ). A thread that wants to secure certain data from race conditions calls the EnterCliticalSection / TryEnterCriticalSection function. If the critical section is free, the thread occupies it; if not, the thread blocks (i.e., it does not execute and does not consume processor time) until the section is freed by another thread by calling the LeaveCriticalSection function. These functions are atomic, i.e. You can not worry about the integrity of your data;)

    Quite a lot has been said about mutexes, events, and semaphores; for this I will not dwell on them in detail. It should be noted that all these mechanisms have common features:
    • They use kernel primitives in execution, i.e. system calls, which does not affect performance.
    • They can be named and not named, i.e. each such synchronization object can be given a name.
    • They work at the system level, and not at the process level, i.e. can serve as an interprocess communication mechanism (IPC).
    • They use a single function to wait and capture the primitive: WaitForSingleObject / WaitForMultipleObjects.


    Posix Threads or pthreads


    It is hard to imagine which of the * nix-like operating systems does not implement this standard. It is worth noting that pthreads is also used in various real-time operating systems (RTOS), so the requirement for this library (or rather the standard) is tougher. For example, a pthread thread cannot sleep. Also pthread no events, but there is a much more powerful tool - dummy variables ( conditional variables An ), which is more than covers all the necessary needs.

    Talk about the differences. For example, a thread in pthreads can be canceled (cancel), i.e. it is simply discontinued by the pthread_cancel system call at the moment of waiting for the release of some mutex or conditional variable, at the time of the pthread_join call (the calling thread is blocked until the thread finishes executing, to which the function was called), etc. d. There are separate calls for working with mutexes and semaphores, such as pthread_mutex_lock / pthread_mutex_unlock, etc.

    Conditional variables (cv) is usually used in conjunction with mutexes in more complex cases. If the mutex simply blocks the thread until another thread releases it, then cv creates conditions where the thread can block itself until some unlock condition occurs. For example, the cv mechanism helps emulate events in the pthreads environment. So, the pthread_cond_wait system call waits until the thread is notified that a specific event has occurred. pthread_cond_signal notifies one thread in the queue that cv has worked. pthread_cond_broadcast notifies all threads that called pthread_cond_wait that cv worked.

    Farewell word


    That's all for today, otherwise there will be too much information. For those interested, there are several useful links and books below;) Also express your opinion on whether you are interested in articles on this topic.

    Windows 2000 for Professionals - Jeffrey Richter
    GNU Portable Threads
    Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism

    UPD: supplemented the article with some information about kernel mode and user mode.
    UPD2: fixed annoying misses and bugs. Thanks to the commentators;)

    Also popular now: