vintage September 1, 2014 at 01:34

Atom - minimal brick of reactive application

Recovery mode

Hello, my name is Dmitry Karlovsky and I ... a client-side developer. I have 8 years of supporting a wide variety of sites and web applications: from unknown online stores to giants like Yandex. And all this time I’m not only figurating in production, but also sharpening an ax to be at the cutting edge of technology. And now that you know that I'm not just horseradish from the mountain, let me tell you about one architectural technique that I have been using for the past year.

This article introduces the reader to the "atom" abstraction, designed to automate tracking dependencies between variables and effectively update their values. Atoms can be implemented in any language, but the examples in the article will be in javascript.

Caution: reading can cause dislocation of the brain, an attack of holivar, as well as sleepless nights of refactoring.

From simple to complex

This chapter briefly demonstrates the typical evolution of a fairly simple application, gradually leading the reader to the concept of an atom.

Let's imagine for starters such a simple task: you need to write a welcome message to the user. It’s not difficult to implement this:

	this.$foo = {}
	$foo.message = 'Привет, пользователь!'
	$foo.sayHello = function(){
		document.body.innerHTML = $foo.message
	}
	$foo.start = function(){
		$foo.sayHello()
	}

But it would not be bad to contact the user by name. Let's say the username is stored in localStorage, then the implementation will be a little more complicated:

	this.$foo = {}
	$foo.userName = localStorage.userName
	$foo.message = 'Привет, ' + $foo.userName + '!'
	$foo.sayHello = function(){
		document.body.innerHTML = $foo.message
	}
	$foo.start = function(){
		$foo.sayHello()
	}

But wait, the calculation of userName and message occurs during initialization, but what if by the time sayHello is called, its name will already change? It turns out we will greet him by the old name, which is not very good. Therefore, let's rewrite the code so that the message is calculated only when it really needs to be shown:

	this.$foo = {}
	$foo.userName = function( ){
		return localStorage.userName
	}
	$foo.message = function( ){
		return 'Привет, ' + $foo.userName() + '!'
	}
	$foo.sayHello = function(){
		document.body.innerHTML = $foo.message()
	}
	$foo.start = function(){
		$foo.sayHello()
	}

Note that we had to change the interface of the message and userName fields - now they do not store the values themselves, but the functions that return them.

Thesis 1: In order not to doom yourself and other developers to tedious refactoring when changing the interface, try to immediately use an interface that will allow you to freely change the internal implementation.

We could hide the function call using Object.defineProperty :

	this.$foo = {}
	Object.defineProperty( $foo, "userName", {
		get: function( ){
			return localStorage.userName
		} 
	})
	Object.defineProperty( $foo, "message", {
		get: function( ){
			return 'Привет, ' + $foo.userName + '!'
		} 
	})
	$foo.sayHello = function(){
		document.body.innerHTML = $foo.message
	}
	$foo.start = function(){
		$foo.sayHello()
	}

But I would recommend an explicit function call for the following reasons:

* IE8 only supports Object.defineProperty for dom nodes.
* Functions can be arranged in chains of the form $ foo.title ('Hello!') .UserName ('Anonymous').
* The function can be passed as a callback somewhere: $ foo.userName.bind ($ foo) - in this case, the whole property will be transferred (both getter and setter).
* The function in its fields can store various additional information: from the global identifier to the validation parameters.
* If we turn to a nonexistent property, an exception will arise, instead of tacitly returning undefined.

But what if the username changes after we show the message? For good, you need to track this point and redraw the message:

	this.$foo = {}
	$foo.userName = function( ){
		return localStorage.userName
	}
	$foo.message = function( ){
		return 'Привет, ' + $foo.userName() + '!'
	}
	$foo._sayHello_listening = false
	$foo.sayHello = function(){
		if( !$foo._sayHello_listening ){
			window.addEventListener( 'storage', function( event ){
				if( event.key === 'userName' ) $foo.sayHello()
			}, false )
			this._sayHello_listening = true
		}
		document.body.innerHTML = $foo.message()
	}
	$foo.start = function(){
		$foo.sayHello()
	}

And then we committed a terrible sin - the implementation of the sayHello method, suddenly, knows about the internal implementation of the userName property (knows where it gets its value from). It is worth noting that in the examples they are nearby only for clarity. In a real application, such methods will be in different objects, the code will be in different files, and it will be supported by different people. Therefore, this code should be rewritten so that one property can subscribe to changes of another through its public interface. In order not to overcomplicate the code, we will use the pub / sub implementation from jQuery :

	this.$foo = {}
	$foo.bus = $({})
	$foo._userName_listening = false
	$foo.userName = function( ){
		if( !this._userName_listening ){
			window.addEventListener( 'storage', function( event ){
				if( event.key !== 'userName' ) return
				$foo.bus.trigger( 'changed:$foo.userName' )
			}, false )
			this._userName_listening = true
		}
		return localStorage.userName
	}
	$foo._message_listening = false
	$foo.message = function( ){
		if( !this._message_listening ){
			$foo.bus.on( 'changed:$foo.userName', function( ){
				$foo.bus.trigger( 'changed:$foo.message' )
			} )
			this._message_listening = true
		}
		return 'Привет, ' + $foo.userName() + '!'
	}
	$foo._sayHello_listening = false
	$foo.sayHello = function(){
		if( !this._sayHello_listening ){
			$foo.bus.on( 'changed:$foo.message', function( ){
				$foo.sayHello()
			} )
			this._message_listening = true
		}
		document.body.innerHTML = $foo.message()
	}
	$foo.start = function(){
		$foo.sayHello()
	}

In this case, communication between properties is implemented through a single bus $ foo.bus, but it can also be a scattering of individual EventEmitters. In principle, the same scheme will be: if one property depends on another, then it must somewhere subscribe to its changes, and if it changes, then you need to send a notification about your change. In addition, the unsubscribe option is not provided for in this code at all when tracking the property value is no longer required. Let's introduce the showName property, depending on the state of which we will show or not show the username in the greeting message. A feature of such a fairly typical statement of the problem is that if showName = 'false', then the message text does not depend on the value of userName and therefore we should not subscribe to this property. Moreover, if we already subscribed to it, because previously there was showName = 'true', then we need to unsubscribe from userName, after receiving showName = 'false'. And so that life does not seem like paradise at all, we add another requirement: the values obtained from localStorage properties must be cached so as not to touch it again. The implementation, by analogy with the previous code, will turn out already too voluminous for this article, so we will use a slightly more compact pseudocode:

	property $foo.userName :
		subscribe to localStorage
		return string from localStorage
	property $foo.showName :
		subscribe to localStorage
		return boolean from localStorage
	property $foo.message :
		subscribe to $foo.showName
		switch
			test $foo.showName
			when true
				subscribe to $foo.userName
				return string from $foo.userName
			when false
				unsubscribe from $foo.userName
				return string
	property $foo.sayHello :
		subscribe to $foo.message
		put to dom string from $foo.message
	function start : call $foo.sayHello

Here, the duplication of information is striking: next to actually getting the property value, we have to subscribe to its changes, and when it becomes known that the value of some property is not required, on the contrary, unsubscribe from its changes. This is very important, because if you don’t unsubscribe in time from non-affecting properties, then as the application becomes more complex and the number of processed data increases, the overall performance will become more and more degraded.

Thesis 2: Unsubscribe from non-influencing dependencies in time, otherwise sooner or later the application will start to slow down.

The architecture described above is called Event-Driven. And this is its least terrible option - in the more common case, subscription, unsubscribing, and several ways of calculating values are scattered in different places of the project. Event-Driven architecture is very fragile, because you have to manually monitor timely subscriptions and unsubscribes, and a person is a lazy creature and not very attentive. Therefore, the best solution is to minimize the influence of the human factor by hiding the mechanism of distribution of events from the programmer, thereby allowing him to concentrate on describing how some data is obtained from others.

Let's simplify the code by leaving only the minimum required dependency information:

	property userName : return string from localStorage
	property showName : return boolean from localStorage
	function $foo.message :
		switch
			test $foo.showName
			when true return string from $foo.userName
			when false return string
	property $foo.sayHello : put to dom string from $foo.message
	function start : call $foo.sayHello

Vigilant readers have most likely noticed that, after getting rid of the manual subscription-unsubscribe, property descriptions are the so-called "pure functions". Indeed, we have obtained FRP (Functional Reactive Paradigm). Let's analyze each term in more detail:

Functional - each variable is described as a function that generates its value based on the values of other variables.
Reactive - changing one variable automatically updates the values of all variables that depend on it.
Paradigm - a programmer needs to turn his mind a bit to understand and accept the principles of building an application.

As you can see, everything described above revolves around variables and dependencies between them. We call such frp variables “atoms” and state their main properties:

1. An atom stores exactly one value in itself. This value can be either a primitive or any object including an exception object.
2. An atom stores a function for calculating a value based on other atoms through an arbitrary number of intermediate functions. Appeals to other atoms during its execution are monitored so that the atom always has up-to-date information about which other atoms affect its state, as well as about the state of which atoms depend on it.
3. When changing the value of an atom, those dependent on it should be updated in cascade.
4. Exceptions should not violate the consistency of the application state.
5. The atom should easily integrate with the imperative environment.
6. Since almost every memory slot is wrapped in an atom, the implementation of atoms should be as fast and compact as possible.

Problems in the implementation of atoms

1. Keeping dependencies up to date

If, when calculating the value of one atom, the value of another was required, then it can be stated with confidence that the first depends on the second. If it was not required, then there is no direct dependence, but indirect is possible. But tracking is necessary and only direct dependencies are enough.

This is realized very simply: at the moment of starting the calculation of one atom somewhere in the global variable it is remembered that it is the current one, and at the moment of receiving the value of the other, in addition to actually returning this value, they are linked to each other. That is, each atom in addition to the slot for the actual value should have two sets: leading atoms (masters) and slaves (slaves).

With linking, everything is somewhat more complicated: at the time of start, you need to replace the set of leading atoms with an empty one, and after the calculation is completed, compare the resulting set with the previous one and unlink those who are not in the new set.

Similarly, autotracking dependencies works in KnockOutJS and MeteorJS.

But how do atoms know when to re-run a value calculation? About it further.

2. Cascading consistent values update

It would seem that could be easier? Immediately after changing the value, we go over the dependent atoms and initiate their update. This is exactly what KnockOutJS does, and that is why it slows down during mass updates. If one atom (A) depends on, for example, the other two (B, C), then if we change their values sequentially, then the value of atom A will be calculated twice. Now imagine that it depends not on two, but on two thousand atoms and each calculation takes at least 10 milliseconds.

While for KnockOutJS developers throttl-ings and debounce-eras are placed in bottlenecks, MeteorJS developers approached the problem in a more systematic way: they made a deferred call to recalculate dependent atoms instead of an immediate one. For the case described above, atom A recounts its value exactly once, and does it at the end of the current event handler, that is, after all the changes we made to the atoms B, C, and any others.

But this is actually not a complete solution to the problem - it pops up again when the depth of the atomic dependencies becomes greater than 2. I will illustrate this with a simple example: atom A depends on atoms B and C, and C in turn depends on D. In case we change the atoms B and D sequentially, then the atoms A and C will be counted back and, if the atom C changes its value, then the delayed calculation of the value of A. will be started again. This is usually not so fatal for speed, but if the calculation of A - quite a long operation, then doubling it can pour in the most unexpected place applications.

Having understood the problem, it’s easy to come up with a solution: when linking atoms, it’s enough to remember the maximum depth among the leading ones plus one, and when iterating over the atoms set aside for updating, first of all update the atoms with less depth. Such a simple technique makes it possible to guarantee that by the time the value of an atom is recalculated, all atoms on which it directly depends have an actual value.

3. Exception handling

Imagine this situation: atoms B and C depend on A. Atom B started calculating the value and turned to A. A - also started calculating its value, but at that moment an exception occurred - it could be a mistake in the code or lack of data - it does not matter. The main thing is that atom A must remember this exception, but let it float further, so that B can also remember or process it. Why is this so important? Because when C starts calculating the value and turns to A, then the events occurring for it should be the same as for B: when accessing A, an exception pops up that can be intercepted and processed, or you can do nothing and then the exception should be caught by a library implementing atoms and stored in a calculated atom. If atoms didn’t remember the exceptions, then any appeal to them would cause the launch of the same code inevitably leading to the same exception. This is an extra processor overhead, so it’s better to cache it like regular values.

Another, and even more important, point is that during cascade atom renewal, these values are calculated in the opposite direction. For example, atom A depends on B, and that depends on C, and that generally on D. When initializing, A starts to calculate its value and turns to B, that goes to C, and that goes to D. But the state is updated in the reverse order: D, then C, then B, and finally A:

Subsequently, someone changes the value of the atom D. He notifies the atom C that its value is no longer relevant. Then the atom C calculates its value and if it is not equal to the previous one, it notifies the atom B, which in the same way notifies A. If at some of these moments we do not catch the exception and as a result we don’t notify the dependent atoms, then we get the situation, when the application is in a contradictory state: half of the application contains new data, half is old, but it is sure that it is new, and the third half has generally fallen and cannot rise in any way, waiting for data to change.

4. Cyclic dependencies

The presence of cyclic dependencies indicates a logical error in the program. However, the program should not freeze or spin in an endless loop of pending calculations. Instead, the atom must detect that its value was required to calculate its value and raise an exception.

It is detected simply: when the calculation starts, the atom remembers that it is being calculated, and when someone turns to its value, it checks whether it is in the calculation state and, if it is, throws an exception.

5. Asynchrony

Asynchronous code is always a problem because it turns logic into spaghetti, whose intricacies are hard to keep track of and easy to make mistakes. When developing in javascript, you have to constantly balance between simple and clear synchronous code and asynchronous calls. The main problem of asynchrony is that it leaks through the interfaces as a monad : you cannot write a synchronous implementation of module A, and then quietly change module A to asynchronous from module B using it. To make such a change, you will have to change the logic of module B, C and D, which depends on it, and so on. Asynchrony is like a virus that breaks through all our abstractions and pushes the internal realization out.

But atoms easily and simply solve this problem, although they did not even think about it at all: the whole thing is reactivity. When one atom turns to another, it can give some kind of answer right away, and in the meantime start an asynchronous task, after which it will update its value and the whole application will be updated next in accordance with the data received. The immediate answer can be of several types:

a) Return some default value. For driven atoms, it will look like “there was one value and suddenly it changed,” but they will not be able to understand the actual data they were shown or not. And it is often necessary to know this, for example, to show the user a message that his data has not disappeared and is about to be loaded.

b) Return the locally cached version. This option is suitable for relatively rarely changing data. For example, the username from the beginning of this article is okay if he changes his name between application launches and therefore he will contemplate the previous name for some short time, but he will be able to start working with the application almost immediately. Of course, this approach is not suitable for critical data, especially in conditions of poor connection, when the update can take a very long time.

c) Honestly, that there is no data and return a special value meaning the lack of data. For javascript, this will be undefined. But in this case, everywhere in the dependent code there should be a correct processing of this value - this is a fairly large amount of the same type of code in which it is easy enough to make a mistake, so by choosing this path, get ready for the Null Pointer Exception now and then.

d) After starting the asynchronous task, throw a special exception, which, as described above, will cascade through all dependent atoms to those atoms where it can be processed. For example, the atom responsible for displaying the list of users can catch the exception and instead of silently fall, draw the message “loading,” or “loading error,” to the user. That is, starting from some remote atom, an exceptional situation becomes quite regular. The advantage of this method is that it is possible to process the lack of data only in a relatively small number of code places, and not all the way to these places. But it’s important to remember that, depending on several atoms, the calculation will stop after the first one who throws the exception, and the rest will not know that their data is also needed, although all leading atoms could query their data in one single query. Fortunately, these moments are easily detected by an excessive number of requests to the server, and they are cunningly not cunningly fixed by setting try-catch in the right places.

6. Integration with peremptory code

No matter how beautiful the FRP looks, it is not a silver bullet and does not solve all problems. In addition, there are a lot of imperative libraries and native api with which you need to be able to be friends. Therefore, first of all, it should be possible to execute arbitrary code upon the fact of changing the value of the atom. And besides, it should be possible to change its value directly, and not through the built-in function in it, as described above. However, it is better to minimize the use of these features, as they require increased attention and accuracy of implementation.

It is conditionally possible to divide atoms into 3 main types according to their use strategy:

a) Sources of state are atoms that are not able to calculate their value. They change it only if someone specifically places it in them.
b) Intermediate - atoms with a lazy state. When they are addressed, they calculate their state based on other atoms. But when they do not have followers, then they unsubscribe from the leaders. At the same time, they can remain in memory in case they are needed again, or they can be deleted so as not to take up resources.
c) State-derived - atoms that imperatively reflect a change in their state outside.

7. memory consumption

Here everything is not so rosy - even if the value fits into 1 bit, each atom stores in itself a bunch of additional information for tens of bytes.

This is what a typical atom stores:

a) the actual value
b) the object of the exception (can be combined with the value, but not in all languages)
c) the current state: it is not relevant, the update is planned, calculation is underway, relevant, error ...
d) many leading atoms
e) the set of driven atoms
e) the maximum depth
e) the number of driven atoms
g) identifier (for languages where it is not possible to identify them by pointer)
h) a pack of functions that implement the desired behavior

Quite a large fee for the accuracy of the dependency graph. There are several strategies for saving memory:

a) Decrease accuracy by storing several values in one atom. As a result, there will be a number of false notifications when neighboring data has changed, but to find out, you need to start calculating the value. You will get an analogue of $ digest from AngularJS, but only within the framework of one atom and only when something has really changed in the atom, and not just “could”.

b) Reduce the number of intermediate atoms. Not all intermediate values make sense to cache. So in many cases, you can do with the usual functions if they are executed quickly enough and changes in the leading atoms do not occur too often.

c) Reduce the number of source atoms. Instead of several sources, you can have one, as in paragraph (a), but do not access it directly, but through intermediate atoms, which receive data from the source and check only the part they need. It seems that we are only doing worse in this way - we are changing, for example, 10 sources, to 1 source + 10 intermediate. But intermediate ones can self-destruct when data is not needed, while not losing data located in the source, allowing you to quickly restore these atoms if necessary.

d) Minimize the number of bonds. There are cases when the introduction of several intermediate atoms, which calculate their value on the basis of several leading ones, allows to reduce the total number of bonds.

e) Functions that specify the behavior should not be placed in the atoms themselves, but in subclasses. If you have a bunch of atoms that behaves the same way, then it’s best to create a subclass for them, where you can specify the behavior and store only different data in instances.

Epilogue

To summarize, let's systematize the interfaces of atoms in the form of the following diagram:

A - some atom. The inner circle is his condition. And the external is the boundary of its interface.
S is the driven part of the application, there can be not only atoms, but everything that somehow depends on the current atom.
M is the leading part on which the value depends.

Arrows indicate data flow. Their fat ends symbolize the initiator of interaction. Those interfaces whose behavior can be specified using user-defined functions are marked in red.

And now more about the interfaces:
get - request for a value. If the value is not relevant, then pull is launched to update it.
pull is the very function to calculate the value of an atom. When he decides to update his value, he calls this function. In it, you can implement an asynchronous request, which then puts the value into the atom via the push interface.
push - set the new value of the atom, but it will not be saved as it is, but will be first passed through the merge interface
merge - merges the new and current value. There may be normalization, dirty-checking, validation.
notify - notification of slaves about the change.
fail - notification of slaves about an error.
set - through this interface, the slave atom can offer the master a new value, and if the new value after merge differs from the current one, then put is called.
put - defaults to push, but its purpose is to offer a new state to the lead atom.

That's all for now, the article has already turned out to be quite long and mostly theoretical. In the sequel, there will be more practice using the javascript library "$ jin.atom". You will learn how it works, which optimizations were used, in addition to those described above. And of course there will be practical examples.

In anticipation of the continuation , I suggest you try to realize the atoms yourself. Then it will be interesting to compare our solutions.

Tags: