Overview of Streaming Models

Original author: Justin Tulloss
  • Transfer

Overview of Streaming Models


Many people do not understand how multithreading is implemented in various programming languages. Nowadays, multi-core processors such knowledge will be very useful.
Here's a short review.

Start (C and native threads)

The first model we will look at is standard OS threads. Each modern OS supports them, despite the difference in the API. Basically, a thread is an instruction execution process that runs on a dedicated processor, the execution of which is controlled by the OS scheduler, and which can be blocked. Threads are created within the process and share resources. This means that, for example, memory and file descriptors are common to all process threads. A similar approach is called native threads.

Linux allows the use of these threads using the pthread library. BSDs also support pthreads. Windows threads work a little differently, but the basic principle is the same.

Java and Green Threads

When Java appeared, it brought with it another type of multithreading called green threads. Green threads are essentially a simulation of threads. The Java virtual machine takes care of switching between different green threads, and the machine itself works as a single OS thread. This has several advantages. OS threads are relatively expensive on most POSIX systems. In addition, switching between native threads is much slower than between green threads.

This all means that in some situations, green threads are much more profitable than native threads. A system can support much more green threads than OS threads. For example, it is much more practical to start a new green thread for a new HTTP connection to a web server, instead of creating a new native thread.

However, there are also disadvantages. The biggest one is that you cannot execute two threads at the same time. Since there is only one native thread, only it is called by the OS scheduler. Even if you have several processors and several green threads, only one processor can call green thread. And all because from the point of view of the OS task scheduler, all this looks like a single stream.

Starting with version 1.2, Java supports native threads, and since then they have been used by default.

Python

Python is one of my favorite scripting languages, and it was one of the first to offer work with threads. Python includes a module that allows you to manipulate native threads, so it can take full advantage of true multithreading. But there is one problem.

Python uses the Global Interpreter Lock (GIL). This lock is necessary so that threads cannot spoil the global state of the interpreter. Therefore, two Python statements cannot be executed at the same time. The GIL is released approximately every 100 instructions and at that moment another thread may intercept the lock and continue its execution.

At first, this may seem like a serious flaw, but in practice the problem is not so great. Any blocked thread will typically release the GIL. C extensions also release it when they don’t interact with the Python / C API, so intensive computations can be transferred to C and avoid blocking running Python threads. The only situation where the GIL really presents a problem is when the Python thread tries to run on a multi-core machine.

Stackless Python is a version of Python that adds “tasklets” (actually green threads). Based on their motives, the greenlet module was created, which is compatible with the de facto standard: cPython.

Ruby

The Ruby threading model is constantly changing. Initially, Ruby only supported its own version of green threads. This works well in many scenarios, but does not allow the use of multiprocessing capabilities.

JRuby translated Ruby threads into standard Java threads, which, as we explained above, are native threads. And that created problems. Ruby threads do not need to be mutually synchronized. Each thread is guaranteed that no other thread will gain access to the shared resource used. This behavior was broken in JRuby, since native threads switch forcibly (preemptive) and therefore any thread can access the shared resource at any time.

Because of this inconsistency and the desire to get native threads by the developers of C Ruby, it was decided that Ruby would switch to them in version 2.0. Ruby 1.9 included a new interpreter that added support for fibers, which, as far as I know, is a more efficient version of green threads.

In short, the Ruby threading model is a poorly documented mess.

Perl

Perl offers an interesting model that Mozilla has borrowed for SpiderMonkey, if I'm not mistaken. Instead of using global interpreter locks as in Python, Perl made the global state local and actually launches a new interpreter for each new thread. This allows you to use real native threads. Not without a couple of snags.

First, you must explicitly specify variables available to other threads. This is what happens when everything becomes local to the flow. We have to synchronize the values ​​for cross-thread interaction.

Secondly, creating a new stream has become a very expensive operation. The interpreter is a big thing and repeated copying it eats up a lot of resources.

Erlang, JavaScript, C # and so on

There are many other models that are used from time to time. For example, Erlang uses a shared nothing architecture that encourages the use of lightweight user processes instead of threads. Such an architecture is just great for parallel programming, as it eliminates all the headaches about synchronization, and the processes are so easy that you can create any number of them.

JavaScript is usually not perceived as a language that supports working with streams, but it is necessary there too. The JavaScript threading model is very similar to the one used by Perl.

C # uses native threads.

On my own: annoy at some superficiality of the article (which I myself am aware of) address the author. I just translated to the best of my modest capabilities. ;) I will be glad to clarifications and additions in the comments.

From myself 2: based on comments, I corrected a couple of phrases. Sorry author! :)

Also popular now: