awolf August 20, 2013 at 10:54

How GIL works in Ruby. Part 2

Transfer

The last time I offered to look into the MRI code to deal with the implementation of GIL and answer the remaining questions. What we will do today.

The draft version of this article abounded with bits of C code, however, because of this, the essence was lost in the details. In the final version there is almost no code, and for those who like to dig into the source, I left links to the functions that I mentioned.

In the previous series

After the first part , two questions remained:

Does GIL do array << nilatomic operation?
Does GIL make Ruby code thread safe?

You can answer the first question by looking at the implementation, so let's start with it.

Last time we figured out the following code:

array = []
5.times.map do
  Thread.new do
    1000.times do
      array << nil
    end
  end
end.each(&:join)
puts array.size

Considering the array to be thread safe, it is logical to expect that as a result we get an array with five thousand elements. Since the array is actually not thread safe, running JRuby or Rubinius code produces a different result than expected (an array with less than five thousand elements).

MRI gives the expected result, but is it an accident or a pattern? Let's start our research with a small piece of Ruby code.

Thread.new do
  array << nil
end

Start with

To understand what is going on in this piece of code, you need to look at how the MRI creates a new stream, mainly the code in the files thread*.c.

The first thing inside the implementation Thread.newis creating a new native thread that will be used by the Ruby thread. After that, the function is executed thread_start_func_2. Take a look at it, not particularly going into details.

For us now, not all the code is important at all, so I highlighted those parts that are of interest to us. At the beginning of the function, a new thread captures the GIL, before waiting for it to be released. Somewhere in the middle of the function, the block with which the method was called is executed Thread.new. In the end, the lock is released and the native thread terminates.

In our case, a new thread is created in the main thread, which means we can assume that at the moment the GIL is held by it. Before proceeding, the new thread must wait for the main thread to release the lock.

Let's see what happens when a new thread tries to capture the GIL.

static void
gvl_acquire_common(rb_vm_t *vm)
{
  if (vm->gvl.acquired) {
    vm->gvl.waiting++;
    if (vm->gvl.waiting == 1) {
      rb_thread_wakeup_timer_thread_low();
    }
    while (vm->gvl.acquired) {
      native_cond_wait(&vm->gvl.cond, &vm->gvl.lock);
    }

This is part of the function gvl_acquire_commonthat is called when a new thread tries to capture the GIL.

First of all, she checks if the lock is already held. If held, the attribute is waitingincremented. In the case of our code, it becomes equal 1. The next line checks to see if the attribute is equal . It is equal, so the next line wakes up the timer thread. A timer thread provides MRI threads, preventing a situation in which one of them constantly holds the GIL. But before proceeding to the description of the timer stream, we will deal with the GIL.waiting1

I have already mentioned several times that behind each thread in the MRI is a native thread. It is, but this scheme assumes that the MRI streams work in parallel, as well as the native ones. GIL prevents this. We supplement the scheme and make it more approximate to reality.

To enable the native thread, the Ruby thread must first capture the GIL. GIL serves as an intermediary between Ruby threads and corresponding native threads, significantly limiting concurrency. In the previous diagram, Ruby threads could use native threads in parallel. The second scheme is closer to reality in the case of MRI - only one thread can hold the GIL at some point in time, so parallel execution of the code is completely excluded.

For the development team, MRI GIL protects the internal state of the system. Thanks to the GIL, internal data structures do not require locks. If two threads cannot change the general data at the same time, the race condition is impossible.

As a developer, what you wrote above means that concurrency in MRI is very limited.

Timer flow

As I said before, a timer thread prevents the GIL from constantly being held by one thread. A timer thread is a native thread for internal MRI needs; it does not have a corresponding Ruby thread. It starts when the interpreter starts in a function rb_thread_create_timer_thread.

When the MRI has just started and only the main thread is running, the timer thread is sleeping. But as soon as some thread begins to wait for the release of the GIL, the timer thread wakes up.

This diagram further illustrates how GIL is implemented in MRI. The thread on the right has just started and, since it is the only one waiting for the GIL to be released, the timer thread wakes up.

Every 100 ms, the timer thread sets a thread abort flag, which the GIL currently holds, using a macro RUBY_VM_SET_TIMER_INTERRUPT. These details are important in understanding whether an expression is atomic.array << nil.

This is similar to the concept of time slicing in the OS, if it is familiar to you.

Setting the flag does not lead to an immediate interruption of the flow (if it did, it would be safe to say that the expression is array << nilnot atomic).

Interrupt flag handling

At the bottom of the file vm_eval.cis the code for processing a method call in Ruby. It sets up the environment for calling the method and calls the required function. At the end of the function vm_call0_body, just before the method returns, the interrupt flag is checked.

If the thread interrupt flag is set, code execution is paused before returning the value. Before executing any other Ruby code, the current thread releases the GIL and calls the function sched_yield. sched_yield- This is a system function that requests the resumption of the next thread in the queue by the OS scheduler. After that, the interrupted thread tries to capture the GIL again, before waiting for another thread to release it.

Here is the answer to the first question:array << nilis an atomic operation. Thanks to GIL, all Ruby methods implemented exclusively in C are atomic.

That is, this code:

array = []
5.times.map do
  Thread.new do
    1000.times do
      array << nil
    end
  end
end.each(&:join)
puts array.size

guaranteed to give the expected result when running on the MRI (we are only talking about the predictability of the length of the array, there are no guarantees about the order of the elements - approx. per.)

But keep in mind that this does not follow from the Ruby code . If you run this code on another implementation that does not have a GIL, it will produce unpredictable results. It is useful to know what GIL provides, but writing code that relies on GIL is not a good idea. By doing so, you find yourself in a situation similar to vendor lock .

GIL does not provide a public API. There is no documentation or specification on GIL. One day, the MRI development team can change the behavior of the GIL or completely get rid of it. That's why writing code that depends on the GIL in its current implementation is not a good idea.

What about the methods implemented in Ruby?

So we know that array << nilis an atomic operation. In this expression, one method is called Array#<<, which is passed a constant as a parameter and which is implemented on C. Switching the context, if it does, does not lead to data integrity violation - this method will release the GIL in any case only before termination.

What about something like that?

array << User.find(1)

Before you call the method Array#<<, you need to calculate the value of the parameter, that is, call User.find(1). As you probably know, it User.find(1)in turn calls a lot of methods written in Ruby.

But GIL makes atomic only methods implemented in C. There are no guarantees for methods in Ruby.

Is the call Array#<<still atomic in the new example? Yes, but do not forget that you still need to perform a right-handed expression. In other words, you must first make a call to a method User.find(1)that is not atomic, and only then the value returned by it will be passed to Array#<<.

What does all this mean to me?

In the first part of the article, we saw what could happen if context switching occurs somewhere in the middle of the function. GIL prevents such situations - even if context switching occurs, other threads will not be able to continue execution, as they will be forced to wait until the GIL is released. All this happens only under the condition that the method is implemented in C, does not access the Ruby code and does not release the GIL itself ( in the comments to the original article, an example is given - adding an element to the associative array (Hash) implemented in C is not atomic, as it addresses to code in Ruby, in order to obtain a hash element - a lane.. )

GIL makes race conditions impossible inside the MRI implementation, but does not make Ruby code thread safe. We can say that GIL is just a feature of MRI, designed to protect the internal state of the interpreter.

The translator will be happy to hear comments and constructive criticism.

Tags: