NeoCode March 31, 2016 at 15:43

Tuples in programming languages. Part 2

In the previous part, I looked at tuple implementations in various programming languages (moreover, I looked at compiled non-scripting languages with static typing and classic C-like syntax, which surprised some readers). In this part, I propose to go beyond the existing and to do essentially the design of a programming language. The initial data is the same: a compiled non-scripting language with static typing and a C-like syntax, including an imperative paradigm (although not limited to it, of course).

In this part, we will try to dream and experiment - but what can be done with tuples at all? How to get the most out of them? How to use them to make a programming language more powerful and expressive, how to arouse admiration of true Code Hackers and at the same time not to confuse ordinary programmers too much? What unexpected possibilities appear in the language, if correctly and competently extrapolating the semantics of tuples in different directions, and what difficulties arise?

So, if you like blurring and holivars on the topic of programming language design, then please, under cat.

SYNTAX
First, let's determine the syntax. You can consider various options for designing tuples in the code for a long time - in curly brackets, in round brackets, generally without brackets ... for a number of reasons, I like the curly braces option and take it as a basis (at least until there is a real need for a different syntax). So, a tuple is a sequence of names or expressions, separated by commas and enclosed in braces. As a unified initialization in C ++.

{1,2,3}
{i, j, k}

Perhaps they have the right to exist and other options, but for our purposes this will be enough for now.

I would also like to note the simple idea of creating tuples from ranges and repetitions. Ranges - a way to specify tuples for enumerated elements (which include integers and enumeration elements)

{1..10}

Repetitions - a method taken from Assembler that allows filling a tuple with the same value

{10 dup 'A'}

In this article I will not use these methods, and I just mentioned them just like that - as a beautiful idea.

GENERAL IDEAS
In the first part, I mentioned a certain idea that caused some misunderstanding - that the tuple was "not quite a type." If you open the same Wikipedia, then it clearly says

In some programming languages, such as Python or ML, a tuple is a special data type.
In programming languages with static typing, a tuple differs from a list in that the elements of a tuple can belong to different types and the set of such types is predefined by the type of the tuple, which means that the size of the tuple is also determined. On the other hand, collections (lists, arrays) have a restriction on the type of stored items, but do not have a length limit.

However, what did I mean when I said that it was “not quite a type”? Sometimes (and quite often) I want to have some kind of language construct, which would allow locally grouping arbitrary objects at the compilation stage. A group alias that would simply be the equivalent of its constituent parts without introducing any additional entities. In fact, this is just a way to look at tuples from a slightly different perspective. For example, multiple return from functions. It can be completely considered as the return of one object of the “tuple” type; but on the other hand, the assignment operation for multiple returns is just one of many operations; on the example of Rust, we saw the possibility of another operation on tuples - comparison on equality. But what if you allow all operations to be performed on tuples? Various questions immediately arise - what kind of operations can these be, can they be performed on tuples of different lengths, over a tuple and a single value? You can also consider the semantics of transferring tuples to functions (perhaps, for example, “expanding” a tuple when passing to a function, or performing a function on all elements of a tuple, etc.). Of course, we will need convenient means of creating and decomposing tuples, access to tuple elements. Maybe something else? ..

MULTIPLE RETURN FROM FUNCTIONS AND TYPE VOID
Let's start with the simplest, and already implemented in many languages - with multiple return from functions. A function can return multiple values, and we call it returning a tuple of several values - just like a function can take multiple arguments. The following generalization suggests itself: a function that returns a single value returns a tuple of a single value. At least, it can be taken as a wish that tuples from one value are freely converted to these same values, and vice versa.

Further extrapolation leads us to the void type. As you know, this is a special type used in most programming languages to indicate that a function does not return a result. Creating objects of this type is prohibited. It is not at all difficult to guess that void is not a full-fledged type, but a designation for a tuple of zero length. Creating objects of this type in the usual sense is really impossible; but if our compiler is able to work with tuples in a more advanced way, then nothing prevents us from introducing a "pseudo-object" of type void in the form of an empty tuple {} and doing something with it. The question is what? And in what cases can it be useful? For now, just note this and move on to the next extrapolations.

MULTIPLE OPERATIONS
We looked at multiple return from functions and the associated multiple assignment. But assignment is just one of many possible operations. By analogy with assignment (from which it all started), we will try to build other operations on tuples:

{x,y,z} = foo(); // обычный множественный возврат, считаем что foo() возвращает 3 значения
{x,y,z} += foo(); // а если присваивание, совмещенное с операцией?
{x,y,z} ++; // или унарная операция
{x,y,z} = {1,2,3} + {a,b,c}; // или полноценная бинарная операция и присваивание

So far it looks good, although perhaps the idea of how such expressions with a large number of operations will look a little alarming. It is important that each object in the expression is a tuple of the same number of elements . Deviation from this rule gives rise to various ambiguities, which should be avoided when designing a programming language. Nevertheless, they must be considered (and, if possible, resolved).

GROUP OPERATIONS
Besides multiple operations, group operations on a tuple and a single value still seem quite tempting. In fact, this is the first deviation from the rule of the same number of elements in each operand of an expression. But I want it to turn out beautifully, so we try:

{i, j, k} = 0; // обнуляем все три переменные... все просто, прозрачно и очень удобно, разве тут могут быть неопределенности?
++{i, j, k}; // тоже очень неплохо смотрится...
{i, j, k} += 100; // и это тоже
rec.{x,y,z}.At(i).flag = true; // и даже так

The general form of such a binary operation is “tuple operator value” , or “value operator tuple” (most operations are commutative); And under the form "value operator tuple" is the assignment of one variable to an entire tuple, and in particular - the result of a multiple return function.

x = {1, 2, 3}; // чему должен быть равен x после присваивания ?
x = foo(); // если foo() возвращает несколько значений, то чему равен x ?

In a situation with assigning a multiple return from a function, obviously you want the first return value to be written in "x", and the next ones ignored. Here is such an unexpected ambiguity. But somehow it is necessary to solve it - I really want to have such a syntax. It could be useful not only for group assignment, but also in a number of interesting cases. For example, indexing an array and even accessing a structure field by name are also operations.

arr[ {1,2,3} ] = { 10, 20, 30 }; // групповой доступ к массиву
obj.{x,y,z} = {10,20,30};        // и даже к полям структуры... 
{obj1, obj2, obj3}.x = { 10, 20, 30 };    // и наоборот... 
{arr1, arr2, arr3} [ 0 ] = {10, 20, 30 }; // и с массивами аналогично

Despite the apparent obviousness of the expressions, formally arr [{1,2,3}] is the form “value operator tuple”, where “value” is arr, “tuple” is {1,2,3}, and “operator” - square brackets. Unlike assigning a tuple to a single value, no questions arise here - the result of such an operation must be a tuple {arr [1], arr [2], arr [3]}. But for the compiler, that assignment, that indexing, are just binary operations. This means that an expression of the form x = {1,2,3} must be expanded in {x = 1, x = 2, x = 3}, that is, the variable x is sequentially assigned all the values of the tuple, and the result is also a tuple (by the way, if such a programming language was in reality - it would be an interesting backfill question for any interviews: what will its elements be equal to? {1,2,3} or {3,3,3}?)
Thus, the question of the proper organization of operations on the tuple and the only element remains open.

BINARY OPERATIONS ABOUT TARGETS WITH DIFFERENT SIZES
Let us consider a more general case — the sizes of tuples — the operands of the binary operation do not coincide. What to do in this case? As always, the simplest solution is to disable :) But let's try to figure it out ... Go and Rust have a special meta variable "_" (underscore), which is used when some element of the tuple to the right of the assignment operator is not needed. In our syntax, this will look like this:

{x, _ ,z} = foo();
{x, _, z} = {1, 2, 3};

The operation with the second component of the tuple is simply ignored. The use of the meta variable "_" in Go and Rust is mandatory for multiple assignments if some results of the returned tuple are not needed. Thus, in these languages, the requirement of mandatory matching of the sizes of tuples remains. But in these languages there are no other multiple operations besides assignment; when you try to extrapolate the meta variable "_" to other operations, you get so interesting results that they should be considered in a separate chapter.

Let's try to consider the general case: what to do if such an expression is written (@ is a generalized binary operation)

{a, b} @ {x, y, z}

What are the options?

Perform operations on the size of a tuple with a smaller size. That is, the result is {a @ x, b @ y}. It contradicts group operations (if single values are considered equivalent to tuples with one element) - in a group operation, a single operand must interact with each element of the tuple.
Perform operations on the size of a tuple with a large size. Moreover, elements of a larger tuple, on which there were not enough elements of a smaller tuple, remain unchanged. {a @ x, b @ y, z}. Also contrary to group operations for the same reason.
Perform operations on the size of a tuple with a large size. When the smaller tuple ends, cyclically move to its beginning. {a @ x, b @ y, a @ z}. This corresponds to group operations, but is abstruse.
Perform operations on the size of the tuple with a larger size. When the smaller tuple is finished, perform operations on the last element of the smaller tuple. {a @ x, b @ y, b @ z}. Also corresponds to group operations, and also abstruse.
Perform operations in pairs “each element of one tuple with each element of another tuple” (“Cartesian product”). {a @ x, a @ y, a @ z, b @ x, b @ y, b @ z}. Also corresponds to group operations, and also abstruse. And by the way, the question arises of how to group pairs - by the first tuple or by the second, or in some other way.

Perhaps there are other options. But now one thing is clear: if you give this opportunity to the programmer, then an indication of how tuples interact should be done explicitly . In this case, this is almost a prerequisite for maintaining clarity in the code. For this, some keywords covering this expression may be used. Until we concretize the list of such words (we can first assume that each of the options may have its own keyword)

ACCESS TO ELEMENTS
Now we should turn to another important possibility - access to the elements of the tuple.
Traditionally, square brackets are used for indexing, and a period is used for access by name; although the Swift option with an index through a dot is not so bad, it is ambiguous when using named numerical constants instead of numbers, and is generally unusual; I prefer to use square brackets for access by index (importantly - by constant index) and a point for access by name (if any)

{10,20,30}[1];      // 20
{x:10,y:20,z:30}.z; // 30

It seems that everything is simple? Actually no longer. Since we introduced multiple and group operations on tuples, such as indexing “[]” and referring to named members of the structure “.”, When using these operations on tuples whose elements are complex objects (for which indexing or “Dot”) - it is not clear what to do: access the element of the tuple or perform a group operation on all elements of the tuple?

{arr1, arr2, arr3} [1]     // "arr2" или "{arr1[1], arr2[1], arr3[1]} " ?
{x:obj1, y:obj2, z:obj3}.z // "obj3" или "{obj1.z, obj2.z, obj3.z}" ?

Another interesting aspect is obtaining (constructing) tuples. By convention, taken at the beginning of the article, we use curly braces to easily construct a tuple object. However, in some cases it may be necessary to construct a tuple from another tuple by deleting or adding elements. Syntactically, this can be done using "multiple indexing", applying essentially the same rules as for arrays or structures.
To obtain a tuple, multiple indexing or ranges could be used:

{a,b,c,d,e}[1,4,0]  // {b,e,a}
{a,b,c,d,e}[1..4]   // {b,c,d,e}
{a,b,c,d,e}[1..3,0] // {b,c,d,a}

(the indexing operation itself contains brackets, therefore, inside curly brackets one more curly can not be written)
If the elements have a composite object names, you can call by names:

obj.{a, d, e} // перечисление полей
obj.{b .. f}  // диапазон - по порядку объявления полей

An interesting corollary is that since tuples are indexed by constant indices, the idea is to cast the field names to constant indices (and possibly vice versa).

Thus, there are at least two “special” operations, “dot” and “square brackets” that can act on the tuple itself as an integral object. The rest of the operations for the tuple are somehow undefined, although it can be assumed that we need, for example, the concatenation of tuples - flat gluing of two or more tuples into one long one. Therefore, the question arises: is it necessary to somehow select access operations directly to the elements of a tuple? Or is it more correct to highlight operations on each element of a tuple?

FROM OPERATIONS TO FUNCTIONS
Any operation is equivalent to some function. For example, the unary operation of bit inversion ~ x can be represented as neg (x), and binary addition x + y as sum (x, y);
therefore, considering operations on tuples as multiple operations, the question arises - what if the function call is involved in such an expression?
To start, the unary operation:

~{1,2,3};     // в форме операции
neg({1,2,3}); // в форме функции

by analogy with "group assignment", we must expand the tuple as follows:

{neg(1), neg(2), neg(3)}

at first glance this seems quite logical; the function itself takes a single value and returns a single value; must return a single value so that a tuple can be composed of them.
Probably, functions with two arguments could be expanded similarly. for instance

{1,2,3} + {4,5,6}
sum( {1,2,3}, {4,5,6} )

{ sum(1,4), sum(2,5), sum(3,6) }

Although it must be admitted that the syntax of an implicit multiple function call for tuples is too implicit, and just like with operations, some explicit way of specifying such a call suggests itself (although the possibility of such a call is not bad in itself).

On the other hand, recall the syntax of Go, D, C∀: there the tuple passed to the function is expanded inside the argument list, replacing the corresponding number of arguments. And in general, this is also very logical and expected - but again incompatible with the “group” semantics! Is it possible to somehow resolve this contradiction? And we have not yet considered complex options when the dimensions of the argument tuples do not match, when the tuples and single values are mixed, when we want to get the Cartesian product from the results of the operation on the tuple elements, etc.

The solution, which seems pretty good, came strangely enough from C ++. There is such an opportunity as templates with a variable number of parameters, and the syntax with ellipsis is used to transfer a package of parameters (by the way, also a tuple) to another template. The ellipsis visually marks that this argument is “revealed” in this context (and this is very important for the perception of the code!). It is important that this is immediately visible in the code. The only thing that is not visible is how many arguments it reveals. But if you want to specify in detail - nothing prevents access to individual elements of the tuple

foo(t..., 10, 20);             // кратко 
foo(t[0], t[1], t[2], 10, 20); // подробно
foo(t.x, t.y, t.z, 10, 20);    // если у элементов кортежа есть имена

Finally, the most difficult situation: a function with several parameters, and tuples of different lengths are transferred to some (or all) parameters. Similarly to the situation with binary operations, there are many possibilities for opening such a call: multiple function calls by the minimum number of elements, by the maximum with different types of addition of shorter tuples, Cartesian product, etc. All these methods are quite difficult to understand, and therefore any of them should be declared only explicitly - for example, using the appropriate keyword before calling the function.

META Variable "_"
Returning to the behavior of the meta variable "_", which is used in some languages to ignore the elements of the source tuple when assigning tuples. Let's see if it is possible to extrapolate this meta variable to more complex cases (after all, assignment is just a binary operation).

{x,y,z} = {1,2,3} + {a,_,c};

By analogy, the operation of adding the number 2 to "_" will be ignored, but what happens as a result? In general, there are two possibilities: either leave the number 2 in the resulting tuple, or distribute "_" there. In the first case, "_" can be considered as a " neutral element ". for any operation (that is, for any operation "@" and any argument "x", x @ _ == _ @ x == x is true). For example, the expression x = y * ~ (_ + z) can be transformed into x = y * ~ z.
However, not everything is clear here. For example, the unary operation of changing the sign “-x” can be written as a binary operation of subtracting a number from zero “0-x”. If instead of "x" put "_", then this expression will have different meanings depending on the recording method.

z = y * (0 - _) // z = y*0, то есть z = 0
z = y * (- _ )  // z = y

In the second case, when "_" appears in some position of the tuple, all further calculations for this position from the node of the syntax tree containing "_" to the root of the tree (that is, the end of the expression, usually semicolons), are discarded (i.e., x @ _ == _ @ x == _ is true). That is, the presence of at least one "_" in the i-th element of the tuple means that all calculations with the i-th element of all tuples in the entire expression are thrown out.
I find it difficult to say which way to work with "_" is better. This is another issue that requires careful consideration.

Nested Tuples
Another interesting aspect that is almost not considered in the documentation for languages is the nesting of tuples.

struct Foo { int x, y, z; };
struct Bar { Foo f; int i, j; };
Bar b = { {10, 20}, 30 };

Such code initializes the fields x, y, i, but leaves z and j uninitialized. Without nested tuples, this would not be so obvious (and by the way, this is an example of how tuples of different sizes participate in the assignment / initialization operator). Thus, nested tuples have very specific use cases. But this also means the need to think through all the features of extrapolating operations on nested tuples.

COMPARISON OPERATIONS
Tuple comparison operations should formally form a tuple of elements of type bool. Consider the equality operation as an example:

{x, y, z} == {1, 2, 3} // например {false, true, false}

But if, while, etc. operators require a single bool value! How to transform a tuple of logical variables into a single value?
There are at least two possibilities.

All elements are pairwise equal (that is, the entire resulting tuple consists of true)
There is at least one pair of equal elements (that is, at least one element of the resulting tuple is true).

This is very similar to the quantifiers of universality and existence from predicate logic. In programming languages where tuple comparisons are possible, the universality quantifier is usually assumed by default. This is obvious for structures and any composite types: indeed, two composite objects are equal if the corresponding component parts of these objects are equal. Therefore, if we consider a tuple as a type, this is the only possibility.
But is this always necessary for the user? Perhaps not always; therefore, it makes sense not only to introduce a quantifier of existence (if the condition is satisfied at least for one pair of elements), but also more complex comparison options.

When comparing a tuple with a single element, the intuitively expected behavior also implies a “logical AND” between the results of comparing each element of the tuple with a single value:

if( {x, y, z} == 0) { /*...*/ }

In the case of comparing tuples of different lengths, the situation is similar to any other operations - any “implied by default” logic is undesirable due to the possibility of potential errors (the programmer may forget to specify an element) and the absence of any unambiguous intuitive interpretation acceptable to everyone.

PRELIMINARY RESULTS
Congratulations if you read up to this place! As you can see, the introduction of such a syntax on the one hand allows you to beautifully record some simple expressions; on the other hand, there are much more questions and ambiguities than answers. This long article is essentially a squeeze on the subject of “tuples and group operations” from my personal notes that I have been making for quite some time (this usually happens - I sit, work, suddenly an idea comes up - I open Evernote or Wiznote and I write down). Perhaps you also had similar thoughts and ideas, or appeared in the process of reading an article - please comment!

Tags:

Tuples in programming languages. Part 2

Also popular now: