The amazing country of Oz, or how to receive data using send

Quite a long time ago, collecting information on parallel programming tools, I came across the elegant (in other words, it is difficult to describe the sensations) language Oz http://www.mozart-oz.org . The language then seemed to me worthy of introducing it to the Habra community. And so, I had time and reasons to do it.

Oz is a multi-paradigm programming language. The set of basic abstractions in the language is unusual and allows, for example, writing a procedure sending information sendso that with its help one can also receive data. And like without any trick:

send(socket; buffer; flag) = (if (flag == RECV) (recv(socket; buffer)) or (realsend(socket; buffer))).

It is precisely the fact that sending and receiving data is carried out by the same sequence of operations of the Oz virtual machine. Naturally, this is achieved through special abstractions for working with data and with parallel processes. This text is devoted to the description of these abstractions, because, in my opinion, they make it possible to feel the features of Oz quite well. Of course, Oz is more than what is stated below, but, it seems to me, the secret of the cunning send is the material suitable for the first acquaintance with this language and for enjoying it.

0. Introduction

You can start with the text of the program that implements the construct Portthrough which data is transmitted using the send procedure. An example of use is included in the same programs Port. Source code (proper name in this text):

 
	declare Port in 
	local PortTag NewPort IsPort Send in 
		PortTag = {NewName} 
		fun {NewPort FS} 
			local SC in 
				C = {NewCell S} 
				FS = !! S 
				{NewChunk port (PortTag: C)} 
			end 
		end 
		fun {IsPort? P} 
			{Chunk.hasFeature P PortTag} 
		end 
		proc {Send PM} 
			local Ms Mr in 
				{Exchange P.PortTag Ms Mr} 
				Ms = M | !! Mr
			end 
		end 
		Port = port 
		( 
			new: NewPort 
			is: IsPort 
			send: Send 
		) 
	end 
	declare NewQueueServer in 
	fun {NewQueueServer} 
		local Given GivePort Taken TakePort Join in 
			GivePort = {Port.new Given} 
			TakePort = {Port.new Taken} 
			proc {Join Xs Ys} 
				local Xr Yr XY in 
					Xs = X | Xr
					Ys = Y | Yr
					X = Y 
					{Join Xr Yr} 
				end 
			end 
			thread {Join Given Taken} end 
			queue 
			( 
				put: proc {$ X} {Port.send GivePort X} end 
				get: proc {$ X} {Port.send TakePort X} end 
			) 
		end 
	end 
	declare Q in 
	thread Q = {NewQueueServer} end 
	{Q.put 1} 
	{Browse {Q.get $}} 
	{Browse {Q.get $}} 
	{Q.put 2}

As you can see, there is no catch. Both queueserver's functions - putand get- are implemented using a procedure Sendwhose semantics correspond to its name. This code could be shorter - syntax sugar is enough in Oz, which is not used above in order to reduce the amount of necessary explanations.

To understand how this code works, you need to deal with (1) the model of Oz program execution: variables, threads, alignment operator ( =); (2) procedures; (3) cells (cells), areas of memory (chunks) and future (futures).

1. Variables, threads, operator `=`

As said, Oz is a multi-paradigm language. On Wikipedia, it is described as functional, logical, imperative, object-oriented, distributed, parallel (in the sense of concurrent) and as a language for programming with restrictions. But it is based on another paradigm - programming with data flows (dataflow programming). Oz implements this paradigm quite close to the original (used in CPU architectures) concept of computing with data streams, and does it as follows.

1.1. All calculations in Oz occur with data contained in the store. At the same time, the storage is quite abstract (internal mechanics are hidden from the programmer, and the information in the store can be controlled only through a certain interface) and distributed (all participants of Oz-computing can work with store, including those scattered across the network). Store can store information in three different ways. (1) In the form of free or related logical variables. (2) In the form of cells, which are more likely not memory cells, but named mutable pointers to variables. (3) In the form of procedures that are named closures in Oz (naturally, they are first-order objects, that is, they can be explicitly manipulated in the program).

1.2. The calculation occurs when the thread is executed. At the same time, the thread in Oz is dataflow. That is, a certain sequence of statements (statements) executed one after another (all as in imperative languages), but with one caveat: an operator is executed only if all the variables used in it are connected. Otherwise, the thread pauses and waits for the moment of binding the necessary variables. Actually, the execution of a certain operator upon the readiness of the data used in it is the main method of dateflow calculations.

Creating threads in Oz is very simple, with the help of a design thread ... end. Making it happen infork()-style. That is, the generated thread has access to all the variables (namely the variables) that were available to the parent thread at the time the operation was performed thread ... end.

1.3. Variables in Oz are logical.

An Oz-style boolean is a once-bound variable that can be equated with another variable.

In general, this definition (translation from the manual) says little about variables in Oz. They allow you to do a little more with yourself. And they are significantly different from the usual c / javascript / haskell variables, which are either named memory cells (c / js) or named values (haskell). Probably the best idea of the variables in Oz will be given a description of some technical details.

1.3.1. The life path of a variable in Oz begins with the fact that it is declared using the local ... in ... endor constructs declare ... in .... The difference between these constructions is that it local ...allows you to declare variables only for use in operations that are inside in ... end, and declarevariables declared with the help will be available everywhere after the corresponding one in. Naturally, declareit can be used to define variables only in the global scope of the module.

The declaration of the variable is that a new node is created in the store, to which the declared variable refers. And the value is written to the nodeunknown, indicating that the variable is in no way associated with data. If a thread calls such a variable for data, it will be suspended until the corresponding node is associated with the data. For example (code-1):

 
	local X in 
		thread {Browse 2 * X} end 
		X = 5 
	end

Browse- This is one of the standard Oz procedures that prints its argument. When declared Xin this example, a variable will be created and the corresponding new node in the store will be created. When it Browsecalls through Xto this node, the execution of the corresponding thread will be suspended until a Xvalue appears in the corresponding node (here the value in the node itself and the binding of the variable to the node can change).

1.3.2. Values in nodes are placed in the process of unification (unification) or (another name) incremental recount (incremental tell). Unification for a pair of expressions is done by the =equality operator. In the same process, the binding of variables to nodes can also change. Unification algorithm for a pair of expressions (specifically for expressions) in the codeE1 = E2It works like this (recursive procedure with parsing options for what E1 and E2 can be).

U.1. The results of calculations E1 and E2 are the values of atomic types Oz (atomic are indivisible types: integers, floating-point numbers, literals - strings, for example). In this case, if these values are equal, the operator is executed, and if the inequality is met, an exception is thrown (exceptions in Oz are fairly standard and they are processed using the construction try ... catch ... finally ... end; it finallyis not necessary).

U.2. The result of the calculation E1is a variable, and the result of the calculation E2is a value of a certain type (symmetric to this situation is the one in which the calculation E1leads to a value of a certain type, and E2- to a variable). It can look like this:

 
	X = (Y + Z) * 5

In this case, everything works as follows.

U.2.1. The E1variable being set refers to an unknown node. If so, then the value obtained by the calculation is written to this node E2, as a result of which the E1variable turns out to be related.

U.2.2. If the E1-variable is already associated with the atomic value, then the value that is in the corresponding node is compared with the E2-value, and if the types or the values themselves do not match , an exception is thrown. Otherwise, the statement ends.

It is by this rule that the variable Xfrom code-1 is bound . After it is associated with the value (5), the operator, waiting for this binding, is executed inside the called procedureBrowse. And here it should be clear that the result would be exactly the same if code-1 looked like this:

 
	local X in 
		thread {Browse 2 * X} end 
		5 = X 
	end

In U.2. another option is possible when the E1-variable is associated with a value of a composite type, but it will be considered later, in clause U.4 ...

U.3. As E1well, they E2set some variable. There are also possible options.

U.3.1. One of the variables is unbound, or both variables turn out to be unbound (that is, they refer to nodes with a value of unknown). In this case, the node of one of the unrelated variables or the unknown node of the one that is not connected is discarded. Then, the variable previously referenced to this node changes so that it begins to refer to the same node as the other variable.

It should already be clear here why, by running code-2:

 
	local XYZ in 
		thread {Browse X + Y + Z} end 
		X = Y 
	% Z = X + Y 
		X = 10 
		X + Y = Z 
	end

Oz will give a value of 40. And why does Oz 'freeze' if you remove the comment (starts with %).

U.3.2. Both variables refer to nodes in which values are written. In this case, the behavior is as follows. If these are values of different types, or if they are different atomic values of the same type, an exception will be thrown. If these are equal values of some atomic type, then the operator will complete its execution. But nodes can also store values of composite types.

The main composite type in Oz is a record. Syntactically, the entries look something like this:

 
	label (feature0: field0 feature2: field2 ... featureN: fieldN)

Approximately, because the above is a closed record, with a fixed number of fields (field), and records are also open. Record fields can be accessed through a dot, by the name of the corresponding property (feature). The number of fields in a structure is called its arity. More specific example:

 
	U = habrauser (nick: 'mikhanoid' karma: 10 strength: 10) 
	K = U.karma

Subtlety: fields in a value of type record themselves are Oz variables. In the example above, when creating a value with the record type and the label habrauser, three variables are created that were unified according to the described algorithm with the values of atomic types: 'mikhanoid', 10 and 10. But instead of such values, variable fields can be unified with any expressions. At this point, it should be clear how this code works:

 
	local UKS in 
(*) U = habrauser (nick: 'mikhanoid' karma: K strength: S) 
		K = s 
		10 = K 
	end

In line (*), an unbound variable is Uunified with a value of the record type according to U.2.1 rule.

Oz records have an important role - they turn the store into a digraph structure: if a node stores a value of the type record, then with its variable fields it points to other nodes, that is, fields are something like arcs marked with property names. And the described algorithm, thanks to U.4., Can be considered as a graph merge algorithm.

U.4. There may be combined such cases: (1) both expression E1and E2are the values of the record type, (2) both expressions define variables that link to the nodes that store the record (3), one of E1,E2sets the value of the record type, and another sets the variable associated with the node in which the record is stored. In all these cases, if the entries have (a) different labels, or (b) different arity, or © different sets of properties, an exception is thrown. If the records coincide in points (a), (b) and ©, then the pairs of variables from different records are unified with the same property names.

At this point, it should already be clear why, as a result of the work of such code:

 
	local WXYZ in 
		W = x 
		Z = y 
		f (a: 10 b: X) = f (a: Y b: 20) 
		{Browse W + Z} 
	end

the value 30 will be displayed.

Another interesting example is the following code:

 
	local Z in 
		Z = f (Z 20) 
		{Browse Z} 
	end

Here fis a variant of the record in which the properties of the fields automatically get integer names from 1 to its arity. In Oz, such records are called tuples and are analogues of compound term in logical programming. In the code above, the following happens. (1) a value of type record is created with two fields, that is, two variables are created. (2) the field with the property is 1unified with an unrelated variable Z, the second field with an atomic value of 20. As a result, the value fpoints to an unknown node, which the variable points to Zand a node that stores the value 20. (3) to an unknown node, pointed Zto by a value of type record, the second element of which (like the variable Z) points to this node. Nothing special, just got a loop in the store graph.BrowseIt prints it like this: R10 = f(R10 20).

Lists in Oz are a kind of tuple. They are organized standardly for functional and logical programming: a list is a tuple, the first element is the head of which is a variable, and the second is a list that makes up the tail. Labeling head from the tail list, you can use the symbol |: Head | Tail.

2. Procedures

Oz allows you to abstract sequences of statements using procedures. Procedures in Oz are values (first-class objects), so they can be freely 'assigned' to variables. In more detail. Oz stores closures that implement procedures in the store as some code. Each such closure procedure gets a unique name for the entire store. Names in Oz are literals - atomic type values. After creating the procedure code and assigning it a unique name, the name is written to the node to which some variable refers. After that, the procedure can be called through this variable.

That is why the source code NewPort, IsPort, Send, ... be declared as variables by using localor declare.

The basic design defining the procedure isproc:

 
	local ... P ... in 
		... 
		proc {P X1 ... XN} S1 ... SM end 
		... 
	end

X1, ..., XNare formal arguments. S1, ..., SMis the sequence of operators that implement the procedure. Procedures are called using curly braces:

 
	{P A1 ... AN}

A1, ..., ANare the actual parameters: expressions, variables, and so on. The call occurs with the declaration of new variables corresponding to formal parameters, which are unified with arguments, among which there may be unrelated variables. Therefore, the procedure can both receive and return data through any parameter. Therefore, to increase the readability of the code, variables that the programmer intends to use only to return values can be marked with a prefix ?, which is just a comment.

The variable associated with the node storing the name of the procedure can also be the result of evaluating the expression, as here, for example:

 
	{Q.put 1}

Oz has the ability to work with anonymous procedures, but it is implemented in an unusual way - through the nesting mechanism. In a sequence of operators enclosed in {...}one of the positions, a symbol can occupy a $label of attachment. When Oz encounters such labels inside curly braces, executing some operator, it (1) automatically creates new variables, in quantity {... $ ...}, in a new local scope, then, in this scope (2) makes calls {... $ ...}, substituting a new variable for the marker, created for this call, and (3) inserts these variables into place {... $ ...}in the executable statement. for example

 
	local P in 
		proc {PX? Y} Y = X + 10 end 
		{Browse {P 20 $} + {P 30 $} + 40} 
	end

will be executed as

 
	local P in 
		proc {PXY} Y = X + 10 end 
		local X1 X2 in 
			{P 20 X1} 
			{P 30 X2} 
			{Browse X1 + X2 + 40} 
		end 
	end

The nesting mark can also be used in the first place inside {...}, but only during the procedure definition. But the attachment mechanism will work in exactly the same way. Codes:

 
	local P in 
		local P1 in 
			proc {P1 X1 ... XN} S1 ... SM end 
			P = P1 
		end 
	end

and

 
	local P in 
		P = proc {$ X1 ... XN} S1 ... SM end 
	end

are equivalent.

Oz functions are also procedures, and are some simplification of the syntax for cases where it is known that the procedure must definitely return some value. Then you can save on the description of one formal parameter, namely: fun {F X1 ... XN} S1 ... SM endthe same as proc {F X1 ... XN Y} S1 ... Y = SM end. A call {F A1 ... AN}, as a function, Oz automatically turns into a procedure call {F A1 ... AN $}

For example.

 
	fun {FX} 
		local K in 
			K = X / 20 
			{Browse K} 
		end 
		local L in 
			L = 30 
			X == L 
		end 
	end

meets the definition

 
	proc {FXY} 
		local K in 
			K = X / 20 
			{Browse K} 
		end 
		Y = local L in 
			L = 30 
			X == L 
		end 
	end

The value of the block local ... in ... endis the value of the last operator in it. That is, in this example, it Fwill be the result of its call to have an answer to the question: is the first and only argument of function 30 equal.

3. Cells, memory areas (chunk) and future (future)

To work with data in the store, Oz supports a few more abstractions.

3.1. Storage areas are some of the entries in the store. But unlike ordinary records, they are identified using unique names (just like procedures) and do not allow you to determine your arity. That is, their components can be accessed only by property names (feature). If these names are hidden from some parts of the code, then in these areas access to the elements of the corresponding memory area will be impossible. Areas are created by calling a function {NewChunk Record}. The call creates a memory area with a unique name, and returns that name. The name can be written to the node referenced by some variable. Then through this variable and the operator .you can access the fields of the memory area:

 
	local XR in 
		R = f (a: 1 b: 2 c: 3) 
		X = {NewChunk R} 
		{Browse Xc} 
	end

3.2. Cells in Oz are designed to work with states. A cell, as a procedure and memory area, in a store is defined by some unique name. Like a variable, it is a pointer to some node, but unlike the variable, the nodes that the cell points to can be explicitly and repeatedly set. The cell defines the following interface.

C = {NewCell E}- creates a cell that points to the node that will result from the unification of the value of the expression Eand the variable - the formal argument of the procedure NewCell. The new unique name of the newly created cell is returned and assigned to the variable C. Link to the node from the cell, of course, is taken into account when collecting garbage. {IsCell C}answers the question: is the variable associated Cwith the cell name.

@C- this expression as a result has a variable pointing to the node to which the cell with the name stored in the variable refers C. C := Echanges the pointer in the cell whose name is stored in C, so that it points to the node obtained as a result of unification of some unrelated nameless variable and the result of the expression E.

{Exchange C E1 E2}- for one inextricable, atomic operation unifies @Cwith E1and, then, performs C := E2.

A cell, for example, can be used as a counter:

 
	local C in 
		C = {NewCell 0} 
		{Exchange CX thread X + 1 end} 
	end

It cannot be ruled out here thread ... end. And this works for the reason that for each running thread Oz automatically creates an unbound variable that is unified with the last of the thread body operators. And this variable in this example is the last argument Exchange.

3.3. Future ones were first proposed in Smalltalk-72, and then the idea of their application was developed in MultiLISP (1985). Future is a fairly common tool: they are in Java, accessible through java.util.cuncurrent.Future; their support is planned in C ++ 0x. They are supported in Oz, but in an unusual form.

The future for some expression Eis an object associated with asynchronous computing E. The future allows some operations with yet undetermined results.E- for example, use it in function calls, write to lists, pass it to other asynchronous calculations, and so on. In the case when it is precisely the value of the expression Ethat is required to continue the work , the appeal to its future will suspend execution until it Eis calculated.

In Oz, regular variables have a similar property: some of the actions with them can be performed without waiting until the values appear in the nodes to which they refer. Many of the examples already given illustrate this. However, variables in Oz are bidirectional channels of information exchange: standing to the left and right of the =expression are symmetric for the unification algorithm. Therefore, in the code:

 
	local X in 
		thread X = 5 end 
		thread X = 7 end 
		X = 3 
		{Browse X} 
	end

none of the threads stand out as producing data. An exception in the process of unification can cause any thread. The future allows us to establish some order in this situation. In Oz, the future is a read-only link to a node, that is, a somewhat limited version of a variable. If the future participates in the unification process, then this process is suspended until the variable corresponding to the future is connected. The future for a variable Xin Oz is shaped by the operator !!X. The order in the example above can be entered, for example, as follows:

 
	local XY in 
		Y = !! X 
		thread Y = 5 end 
		thread X = 7 end 
		Y = 3 
		{Browse Y} 
	end

here the value will always be formed only in the second thread.

4. All together

It is convenient to bring the code from the introduction again so that you do not need to scroll this text to the beginning:

 
00 declare Port in 
01	 
02 local PortTag NewPort IsPort Send in 
03 PortTag = {NewName} 
04	 
05 fun {NewPort FS} 
06 local SC in 
07 C = {NewCell S} 
08 FS = !! S 
09 {NewChunk port (PortTag: C)} 
10 end 
11 end 
12	 
13 fun {IsPort? P} 
14 {Chunk.hasFeature P PortTag} 
15 end 
sixteen	 
17 proc {Send PM} 
18 local Ms Mr in 
19 {Exchange P.PortTag Ms Mr} 
20 Ms = M | !! Mr
21 end 
22 end 
23	 
24 Port = port 
25 ( 
26 new: NewPort 
27 is: IsPort 
28 send: Send 
29) 
30 end 
31	 
32 declare NewQueueServer in 
33	 
34 fun {NewQueueServer} 
35 local Given GivePort Taken TakePort Join in 
36 GivePort = {Port.new Given} 
37 TakePort = {Port.new Taken} 
38	 
39 proc {Join Xs Ys} 
40 local Xr Yr XY in 
41 Xs = X | Xr
42 Ys = Y | Yr
43 X = Y 
44 {Join Xr Yr} 
45 end 
46 end 
47	 
48 thread {Join Given Taken} end 
49		 
50 queue 
51 ( 
52 put: proc {$ X} {Port.send GivePort X} end 
53 get: proc {$ X} {Port.send TakePort X} end 
54) 
55 end 
56 end 
57	 
58 declare Q in 
59	 
60 thread Q = {NewQueueServer} end 
61	 
62 {Q.put 1} 
63 {Browse {Q.get $}} 
64 {Browse {Q.get $}} 
65 {Q.put 2}

It makes no sense to describe the entire program line by line (which rarely gives an idea of how even non-parallel programs work), but a few points need to be clarified.

4.1. Creation Port.

Portrepresents a record, fields are associated with a set of procedures. This is nothing more than a convenient grouping of several procedures. The port itself is a memory area containing a cell under the name, whose uniqueness is guaranteed by the function NewName. The variable PortNameis visible only in the local unit (02-30), so access to the components of the memory area can only be obtained from the procedures IsPort, NewPort, Send. Outside this block is about variablePortNamenothing is known, therefore access to the field of the memory area storing the state of the port is unlikely (of course, the name can be guessed, but such an mechanism is quite enough for encapsulation).

Attention should be NewPortpaid to the function (05-11). In addition to returning the port memory region itself, the argument, through its first argument, also returns the future variable that initializes the cell. This is used to initialize Givenand Taken(36-37).

4.2. Procedure Send(17-22).

It should be noted that the cell in the port memory area always points to an unknown node. And Sendat the next call, the first thing it guarantees is that, linking to the previous node is entered into a variable Ms, and in its place by placing a link to the node of an unbound variable Mr. Further inSendthe value of the variable is generated Ms, which is obtained during unification with the list tuple, in which the sent message (which may be unrelated) is in the first place, and the future is in the second Mr(again: the list is just a tuple, which can be in the second place anything). The subsequent call Sendwill do the same with the variable Mr, and this process will gradually turn the variable that was used to initialize the cell to NewPort, and whose future was returned outside, into a list. In other words, it Sendreally sends messages in a queue.

4.3. Procedure Join(39-46).

The most important thing here is what this procedure is applied to, working in a separate thread (48). It applies toGivenand Takenwhich, as a result of calls, Sendgradually turn into lists. Valid parameters Joinare always future, therefore, the unification operators (41-42) are executed only after binding Xsand Ysthat occurs in Send, turning these variables into links to view tuples Message | SomeFuture . After that, the Messagecomponents of the incoming ( Given) and outgoing ( Taken) flows are unified, and Joincalled for future tails of lists (A bit like quantum mechanics, isn’t it? Einstein probably did not have enough knowledge about parallel programming to build a model of the Universe with hidden variables).

4.4. Eventually.

Any set to stream GivenusingSendthe bound variable will be unified in the thread (48) with the corresponding unbound variable set to the stream Taken. Of course, the situation is completely symmetrical, and the names putand getare used for convenience. But the queueserver is still asynchronous, which is useful.

5. Other interesting features of Oz

Oz offers among its other features (and many of them are vast), it also offers an interesting parallel programming model. Which differs from the classic one adopted in Prolog, for example, in that it allows the programmer to define his own procedures for searching and enumerating options. And, of course, the logical programming model adopted by Oz is parallel.

Tags: