Synchronicity is a myth

Original author: Terry Crowley
  • Transfer
Hello to all!

Today you will find a long text without pictures (slightly shortened in comparison with the original), where the thesis in the title is analyzed in detail. Microsoft veteran Terry Crowley describes the essence of asynchronous programming and explains why this approach is much more realistic and more expedient than a synchronous and consistent one.

Those who wish or are thinking of writing a book covering such topics - write in a personal.

Synchronicity is a myth. Nothing happens instantly. Everything takes time.
Some characteristics of computing systems and programming environments are fundamentally based on the fact that calculations take place in the three-dimensional physical world and are limited by limits based on the speed of light and the laws of thermodynamics.

Such rootedness in the physical world means that some aspects do not lose their relevance even with the advent of new technologies that provide new opportunities and reach new levels of productivity. They remain valid, since these are not just “the options chosen in the design,” but the fundamental reality of the physical Universe.

The distinction between synchronicity and asynchrony in language and the creation of systems is precisely that aspect of design that has deep physical foundations. Most programmers immediately begin to work with such programs and languages ​​where synchronous execution is implied. In fact, it is so natural that no one directly mentions or talks about it. The term "synchronous" in this context means that the calculation takes place immediately, as a series of consecutive steps, and nothing happens before it is completed. I execute “c = a + b” или “x = f(y)”- and nothing else happens until the execution of this instruction completes.

Of course, nothing instantly happens in the physical universe. All processes are associated with some delays - you need to navigate the memory hierarchy, execute a processor cycle, read information from a disk drive, or connect to another node over the network, which also causes delays during data transfer. All this is a fundamental consequence of the speed of light and the propagation of the signal in three dimensions.

All processes are a bit late, everything takes time. Defining some processes as synchronous, we basically say that we are going to ignore this delay and describe our calculation as instantaneous. In fact, in computer systems a serious infrastructure is often laid, which allows you to continue to actively use basic hardware, even when they try to optimize the interface for programming, presenting the events occurring on it as synchronous.

The idea that synchronization is provided by a special mechanism and involves costs, it may seem illogical to a programmer who is more accustomed to the fact that it is asynchronous that requires active external control. In fact, this is what actually happens when an asynchronous interface is provided: a true fundamental asynchrony is opened to the programmer a little more pronounced than before, and he has to handle it manually rather than relying on a program that could do it automatically. Direct provision of asynchrony is associated with unnecessary costs for the programmer, but at the same time allows you to more intelligently allocate the costs and trade-offs inherent in this subject area, rather than leaving it at the mercy of the system, which would have to balance such costs and trade-offs.

For example, the processor and memory system are provided with a hefty infrastructure responsible for reading and writing data in memory, taking into account its hierarchy. At level 1 (L1), a cache reference may take several nanoseconds, while the memory reference itself must go all the way through L2, L3 and main memory, which may require hundreds of nanoseconds. If you just wait for the memory reference to resolve, the processor will be idle for a significant percentage of the time.

Serious mechanisms are used to optimize such phenomena: pipelining with a forward scan of a command stream, simultaneous multiple memory sampling operations and current data storage, branch prediction and attempts to further optimize the program, even when it jumps to another section of memory, accurate control of memory barriers to ensure that this whole complex mechanism will continue to provide a consistent memory model for a higher-level programming environment. All these things are being done in an effort to optimize performance and use hardware to the maximum to hide these delays of 10-100 nanoseconds in the memory hierarchy and provide a system that seems to be synchronous execution,

It is far from always clear how effective such optimizations are for a particular code fragment, and answering this question often requires very specific tools for analyzing performance. Such analytical work is provided for the development of a few very valuable code (for example, as in the conversion engine for Excel, some compression options in the kernel or cryptographic paths in the code).

Operations with more significant delay, for example, reading data from a rotating disk, require the use of other mechanisms. In such cases, when requesting to read from the OS disk, you will need to completely switch to another thread or process, and the synchronous request will remain unsent. The high costs of switching and supporting this mechanism are acceptable as such, since the latency delayed in this case can reach several milliseconds rather than nanoseconds. Please note: these costs are not limited to simply switching between threads, but include the cost of all the memory and resources, which are actually idle for nothing until the operation is completed. All these costs have to go to provide a supposedly synchronous interface.

There are a number of fundamental reasons why it may be necessary to disclose real basic asynchrony in the system and for which it would be preferable to use an asynchronous interface with a certain component, level or application, even taking into account the need to directly cope with the increasing complexity.

Parallelism . If the provided resource is designed for true concurrency, then the asynchronous interface allows the client to more naturally issue several requests at once and manage them, to more fully utilize basic resources.

Pipelining. The usual way to reduce the actual latency on an interface is to ensure that at each moment several requests are waiting to be sent (how useful this actually is in terms of performance depends on where we get the source of the delays). In any case, if the system is adapted to pipelining, then the actual delay can be reduced by a factor equal to the number of requests waiting to be sent. So, it can take 10 ms to complete a particular request, but if you write 10 requests into the pipeline, then a response can arrive every millisecond. Total throughput is a function of the available pipelining, and not just the end-to-end delay per request. A synchronous interface issuing a request and waiting for a response will always give a higher end-to-end delay.

Packing (local or remote) . An asynchronous interface more naturally provides an implementation of a query packaging system, either locally or at a remote resource (note: in this case, the “remote” controller can be a disk controller at the other end of the I / O interface). The fact is that the application should already cope with the receipt of the response, and there will be some delay, since the application will not interrupt the current processing. Such additional processing may be coupled with additional requests that would naturally be combined into a package.

Local packaging can provide a more efficient transfer of a series of requests, or even compress and delete duplicate requests directly on the local machine. To be able to simultaneously access a whole set of requests on a remote resource, serious optimization may be required. The classic example is: a disk controller reorders the read and write workflow to use the position of the disk head on a rotating plate and minimize the time to feed the heads. On any data warehouse interface running at the block level, you can seriously improve performance by bundling a series of such queries in which all read and write operations fall on the same block.

Naturally, local packaging can be implemented on a synchronous interface, but this will either have to largely “hide the truth”, or program the package combination as a special interface feature, which can make the entire client much more difficult. A classic example of “hiding the truth” is buffered I / O. The application calls “write(byte)”, and the interface returnssuccess, but, in fact, the recording itself (as well as information about whether it passed successfully) will not take place until the buffer is explicitly full or empty, and this happens when the file is closed. Many applications can ignore such details - confusion arises only when the application needs to guarantee some interactive sequence of operations, as well as a true idea of ​​what is happening at the underlying levels.

Unlocking / Unleashing. One of the most common asynchronous applications in the context of graphical user interfaces is to prevent the main user interface thread from blocking so that the user can continue to interact with the application. Delays in long-term operations (for example, network communication) cannot be hidden behind a synchronous interface. In such a case, the user interface flow must explicitly control such asynchronous operations and cope with the added complexity that is brought into the program.

The user interface is just such an example where a component must continue to respond to additional requests and, therefore, cannot rely on some standard mechanism that hides delays in order to simplify the work of the programmer.
A web server component that receives new connections to sockets will, as a rule, very quickly transfer such a connection to another asynchronous component providing interaction on the socket, and will return to processing new requests.

In synchronous models, components and models of their processing are usually closely connected.
Asynchronous interactions are a mechanism often used to weaken the binding .

Cost reduction and management. As mentioned above, any mechanism for hiding asynchrony is associated with some resource allocation and costs. For a particular application, such costs may be unacceptable, and the designer of this application must find a way to control natural asynchrony.

An interesting example is the history of web servers. Early web servers (created on Unix basis) for managing an incoming request usually forked a separate process. Then this process could read this connection and write to it, it was happening, in essence, synchronously. Such a design developed, and costs were reduced when flows were used instead of processes, but the overall synchronous execution model was maintained. In modern versions of the design it is recognized that the focus should not be on the computational model, but, above all, on the associated input / output associated with reading and writing when exchanging information with the database, file system or transferring information over the network, while formulating the response . Usually for this are working queues

The success of NodeJS in backend development is explained not only by the support of this engine from the side of numerous JavaScript developers who grew up creating client web interfaces. In NodeJS, as well as in browser scripting, great attention is paid to designing in an asynchronous key, which is well combined with typical server load options: server resource management depends primarily on input / output, and not on processing.

There is another interesting aspect here: such trade-offs are more explicit and are more amenable to adjustment by the application developer if you follow the asynchronous approach. In the example with delays in the memory hierarchy, the actual delay (measured in processor cycles in terms of a query in memory) has dramatically increased over several decades. Processor developers are struggling to add new levels of cache and additional mechanisms that are increasingly pushing the memory model provided by the processor so that this visibility of synchronous processing continues to be maintained.

Context switching at the boundaries of synchronous I / O is another example where actual trade-offs have changed dramatically over time. The increase in processor cycles does not happen much faster than the struggle with delays, and this means that the application now misses much more computational capabilities while it is idle in a locked state, waiting for the completion of IO. The same problem associated with the relative cost of compromise, prompted the OS designers to adhere to such memory management schemes, which are much more reminiscent of early models with process swapping (where the entire process image is loaded into memory, after which the process starts) pages. It is too difficult to hide the delays that may occur on the border of each page.

Other topics

Cancel

Cancel is a complex topic . Historically synchronous-oriented systems did not cope well with cancellation processing, and some even did not support cancellation at all. Cancellation essentially had to be designed “out of lane,” for such an operation it was required to call a separate execution thread. As an alternative, asynchronous models are suitable, where cancellation support is organized more naturally, in particular, this trivial approach is used: it simply ignores which response is eventually returned (and is it returned at all). Cancellation becomes more important when the variability of delays increases, as well as the error rate increases in practice - which gives a quite good historical cut, demonstrating how our network environments have evolved.

Throttling / Resource Management A

synchronous design by definition imposes some throttling, preventing the application from issuing additional requests until the current request is completed. In asynchronous design, throttling in vain does not happen, so sometimes you have to implement it explicitly. This post describes the situation with Word Web App as an example, where the transition from synchronous to asynchronous design caused serious problems with resource management. If an application uses a synchronous interface, then it may well not recognize that throttling is implicitly embedded in the code. By removing such implicit throttling, you can (or have to) more clearly manage your resources.

I had to face it at the very beginning of my career when we ported a text editor from the Sun synchronous graphics API on X Windows. When using the Sun API, the drawing operation occurred synchronously, so the client did not receive control back until it was completed. On X Windows, a graphical request was asynchronously dispatched over a network connection, and then executed by a display server (which could be on the same or a different machine).

To ensure good interactive performance, our application must provide some rendering (that is, ensure that the line that the cursor is currently in is updated and rendered), and then check if there is any other keyboard input that needs to be read. If such input is found, the program will throw the current screen (which will somehow become irrelevant after processing the input that is currently in the queue) to read and process this input, and then redraw the screen with the latest changes. Such a system was well tuned to work with a synchronous graphics API. An asynchronous interface could receive requests for rendering faster than they could be executed, which is why the screen constantly hung up from user input. This turned into a real nightmare when interactively stretching images, since the relative cost of issuing a new request was incomparably lower than the cost of executing it. The UI scrambled, performing a whole series of unnecessary redraws after each mouse cursor movement.

This problem does not lose its relevance today, more than 30 years later (apparently, Facebook application for iPhone in certain scenarios suffers from exactly the same problem). An alternative scheme is to write a special code for updating the screen (which takes into account how often the screen is able to update), make it lead and explicitly send a callback to the client to draw the area, and not leave the client the leading role in this interaction. In this case, the client must adhere to tight timing, so that such code works efficiently, and this is not always practical.

Complexity

This whole topic ultimately revolves around the relative complexity of creating applications built on asynchronous circuits. In one of my first lectures given at a high-level internal technical seminar of Microsoft, I argued that asynchronous APIs are a key component for creating applications that will not hang as often as the first programs for the PC. In the gallery sat Butler Lampson, Turing Award winner, one of the founders of the PC segment - and he exclaimed: “Yes, but they will not work either!” In the following years I talked a lot and substantively with Butler, but he remained deeply concerned how to manage asynchrony on a large scale.

There are two key problems encountered with asynchronous design. The first is how to describe that the calculation needs to be restarted upon the arrival of the asynchronous response. In particular, there is such a problem: is it possible to find a layout method that supports information hiding, which is necessary for building complex applications consisting of independent components. State machines that operate in an explicit event-oriented key are common. For practical solutions to such problems, language constructs are applicable, for example,async/awaitor promises. More "rough" approaches, say, using callbacks, are very common in JavaScript code. In this case, there is such a problem: the long-term state of the program is buried under impenetrable callbacks, which often remain unmanaged and cannot be managed at all. Async/awaitallows you to describe an asynchronous calculation, as if it were sequential, programmed in a synchronous code. Under the hood, it all turns into chains of closures and continuation functions. This approach also provides a standard wrap for asynchronous computing, without requiring the use of spontaneous and inconsistent techniques that are observed in raw code based on callbacks.

But none of these approaches solves the second key problem. In essence, it comes down to how to isolate an intermediate state. With a synchronous design, any intermediate state used in the calculations disappears, so after the synchronous calculation returns, it is no longer available. In asynchronous design, the same problem arises as in multi-threading, when different threads can operate with shared shared data (beware of dragons!). The intermediate state (potentially) can open, exponentially increasing the actual number of program states that we need to judge about.

In fact, a state leak occurs in both directions. The internal state may be provided by the rest of the program, and the reference to the external state change may come from an asynchronous computation. Such mechanisms as async / await, which intentionally align the code so that it seems consistent and synchronous, do not clarify the risks that arise here, but only mask them.

As in general, when designing, when you have to manage complexity, an isolation strategy comes to the fore, especially data isolation. One of the challenges when working with graphic applications is that the program may well need to inform the user about these intermediate states. The usual requirement is to demonstrate the progress of a particular calculation, or interrupt, and then restart the calculation affected by certain user actions (classic examples that fit into both these categories are page markup in Word or recalculation of results in Excel). In many cases I have come across for many years of practice, a lot of complexity is added because of attempts to determine how these intermediate program states should be presented to the user, and not just put the cursor,
Specifically, for especially long-running operations or for those where the risk of failure is high, the user really wants to understand what exactly is happening in the program.
Whatever strategy we choose, it is always useful to act in a disciplined and uniform manner - with time such a strategy will certainly pay off. Impromptu approaches quickly fall into incredible complexity.

findings

The world is asynchronous. Synchronicity is a myth, and one that can cost you a lot of money. Recognizing that asynchrony often allows us to more likely model what is happening in the depths of our system, we can consciously manage all the resources and costs. The difficulty is associated with an increase in program states that need to be judged; to control the complexity, it is necessary to isolate the state and limit the number of states that are visible for a particular code fragment.

Also popular now: