kivsiak October 14, 2010 at 05:04

How to get rid of the addiction to synchronicity

Transfer

In comparison, asynchronous programming is superior to synchronous programming in both memory consumption and performance. We have been familiar with this fact for years. If you look at Django or Ruby on Rails, perhaps the two most promising web frameworks that have appeared in the last few years, both are written based on the synchronous style. Why even in 2010 do we write programs that rely on synchronous programming?

The reason we are stuck in synchronous programming is twofold. First, the way code is written directly for asynchronous behavior is inconvenient. Secondly, popular and / or common languages have insufficiently built-in constructions required to implement less straightforward approaches to asynchronous programming.

Asynchronous programming is too complicated

Let's first look at a direct implementation: an event loop. In this campaign, we have one process with a closed infinite loop. Functionality is achieved by quickly completing small tasks in this cycle. One of them can read several bytes from a socket, while another function can write several bytes to a file, and another can do some calculations, for example, read XOR on data that is buffered from the first socket.

The most important thing in this cycle is that one and only one task is performed at any given time. This means that you have to break the logic into small pieces that are executed sequentially. And if one of the functions is blocked, this stops the whole cycle, and nothing can be executed at this moment.

We have some really good frameworks designed to make working with event loops easier. In Python, this is Twist, and, somewhat newer, Tornado. Ruby has an EventMachine. Perl has a POE. What these frameworks do in two ways: they provide constructs for easier work with the event loop (such as Deferreds or Promises), and they provide asynchronous implementations of common tasks, for example, HTTP or DNS clients.

But these frameworks do not There are two very good reasons for asynchronous programming: First, we need to change the coding style. Imagine what it would look like to display a simple blog page with comments. Here is a small piece of JavaScript to show how it works in sync One Framework:

function handleBlogPostRequest(request, response, postSlug) {
    var db = new DBClient();
    var post = db.getBlogPost(postSlug);
    var comments = db.getComments(post.id);
    var html = template.render('blog/post.html',
        {'post': post, 'comments': comments});
    response.write(html);
    response.close();
}

And now a piece of code demonstrating how this can be in an asynchronous framework. It is necessary to note several things at once: the code is specially written so that 4 levels of nesting are not required. We also wrote callbacks inside handleBlogPostRequest to take advantage of closures, such as access to request and response objects, a template context, and a database client. Avoiding nesting and closure is what we should think while writing such code. But this is not even implied in the synchronous version.

function handleBlogPostRequest(request, response, postSlug) {
    var context = {};
    var db = new DBClient();
    function pageRendered(html) {
        response.write(html);
        response.close();
    }
    function gotComments(comments) {
        context['comments'] = comments;
        template.render('blog/post.html', context).addCallback(pageRendered);
    }
    function gotBlogPost(post) {
        context['post'] = post;
        db.getComments(post.id).addCallback(gotComments);
    }
    db.getBlogPost(postSlug).addCallback(gotBlogPost);
}

By the way, I chose JavaScript to show the point of view. People are now very happy with node.js , and this is a very cool framework, but it does not hide the complexity that stretches beyond asynchrony. It only hides some details of the implementation of the event loop.

The second reason why these frameworks are not good enough is that not all I / O can be handled properly at the framework level, in which case you need to turn to hacks. For example, MySQL does not provide asynchronous drivers, so most well-known frameworks use threads to make sure that this communication will work out of the box.

The resulting inconvenient API, the added complexity, and the simple fact that most developers do not change their coding style leads us to conclude that this type of framework is not a welcome final solution to the problem (I assume that you can do the Real Work today using these techniques , like many programmers already). This leads us to the thought: what other options do we have for asynchronous programming? Coroutines (coroutines) and lightweight processes, which leads us to a new important problem.

Languages do not support lighter asynchronous paradigms

There are several language constructs that, when properly implemented in modern languages, can pave the way for alternative methods to write asynchronously while avoiding the flaws of the event loop. These constructs are coroutines and lightweight processes.

A coroutine is a function that can stop and return to execution in a specific, programmatically defined place. This simple concept may allow code that looks blocking to non-blocking. At several critical points in the code of your I / O library, low-level I / O functions can decide to “cooperate”. In this case, one may pause execution while the other returns to execution, and so on.

Here is an example (in Python, but I think it’s clear):

def download_pages():
    google = urlopen('http://www.google.com/').read()
    yahoo = urlopen('http://www.yahoo.com/').read()

Usually it works like this: a new socket opens, connect to Google, an HTTP header is sent, the full response is read, buffered and assigned to the google variable . Then the same goes for the yahoo variable .

Ok, now imagine that the underlying socket implementation was built using coroutines that interact with each other. This time, as last time, the socket will be open and the connection will be established with Google, after which a request will be sent. But this time, after sending the request, the release of the socket will suspend its execution.

Pausing its execution (but not returning a value), execution continues from the next line. This also happens on the Yahoo line: as soon as the request is sent, the Yahoo line pauses execution. But there is still something to interact with - for example, some data can be read from a Google socket - and it returns to its execution at this point. It reads some data from the Google socket and pauses its execution again.

Execution jumps back and forth between two coroutines until one of them completes. For example, the Yahoo socket has ended, but Google has not. In this case, the Google socket continues reading its socket until completion, because there are no other coroutines to communicate with. As soon as the Google socket is finally completed, the function will return the entire buffer.

Then the line with Yahoo will return all its data.

We kept the style of our blocking code, but used asynchronous programming. The great thing is that we have maintained our original program algorithm - the google variable is assigned first, then yahoo . In truth, somewhere below we got a clever cycle of events to determine who gets executed, but this is hidden from us by the fact that coroutines are used.

Languages like PHP, Python, Ruby, Perl, simply do not have built-in coroutines fast enough for the background implementation of such a transformation. So what's up with lekgovnymi processes?

Lightweight processes are what Erlang uses as its primary threading primitive. Essentially, these processes are mostly implemented in Erlang VM. Each process has approximately 300 words of overhead, and its execution is planned mainly in Erlang VM, without sharing state between all processes. In fact, we do not need to think about creating a process, it is almost free. The trick is that all processes can interact only through messaging.

The use of lightweight processes at the virtual machine level eliminates excessive memory consumption, changing contests and the relative slowness of interprocess communication provided by the operating system. The same virtual machine has full access to the memory stack of each process and can freely move or resize these processes and their stacks. This is what the OS simply cannot do.

With this model of lightweight processes, it is possible again to return to the generally accepted model of using different processes for all our asynchronous needs. The question becomes: can the concept of a lightweight process be implemented in languages other than Erlang? Answer: “I do not know.” In my opinion, Erlang uses some features of the language (such as the lack of changing data structures - Ed. Note: it has no variables) for its implementation of lightweight processes.

And where to move on

The key idea is that developers should think about their code in terms of callbacks and asynchrony, as they require asynchronous, cycle-based event frameworks. After 10 years, we still see that most of the developers posed to this issue simply ignore it. They continue to use the annoying blocking methodologies of the past.

We must pay attention to alternative implementations such as coroutines and lightweight processes, which will allow us to make asynchronous programming as simple as synchronous. Only after that we will be able to get rid of attachment to synchronism.

Note Per .: Meanwhile, coroutines are already actively used. At least in python:

Tags:

How to get rid of the addiction to synchronicity

Asynchronous programming is too complicated

Languages ​​do not support lighter asynchronous paradigms

And where to move on

Also popular now:

Languages do not support lighter asynchronous paradigms