Llst internals, part 1. Introduction to Smalltalk

  • Tutorial
Good day. I bring to your attention the second article from the Low Level Smalltalk (LLST) series. Those who don’t know what it is about, I recommend reading the previous review article , which tells what llst is and why it was created.

In this part, we will concentrate on the Smalltalk language itself, its syntax and the "rules of the game."

In the following parts, we will smoothly move on to the features of the virtual machine implementation and the internal representation of objects in memory. We will touch upon the issues of organizing a memory manager and garbage collector. We will also talk about the bytecodes of a virtual machine. We will learn how the text of the Smalltalk method turns into a sequence of commands. Finally, we trace the path from loading the image into the machine’s memory to the processes that occur when sending messages between objects, and we also learn how closures are implemented in blocks.


Programming languages ​​are different. Some have a fairly low entry threshold. Others scare the potential adherent even at distant approaches, terrifying him with the quirky and alien syntax, excessive verbosity of the narrative, or the complexity of the concepts. There are languages ​​that require a programmer to literally turn the brain inside out in order to learn how to think the way it is necessary for successful programming in such a language. Some languages ​​only look simple, but in reality they demand a strong knowledge of a candidate from mathematics, lambda calculus and category theory ...

So, Smalltalk is a simple language. A simple language is not only in terms of syntax, but also in terms of understanding it by an unprepared person. Not surprisingly, the author initially positioned Smalltalk as a language for teaching children programming. Unfortunately, it did not grow together. Children prefer PHP and basic (or what is now fashionable there?). Well, let's not talk about that.

As you know, any theory must be confirmed by practice, therefore, in support of the above thesis, now we will go over the key concepts of the language and at the end of the introduction we can confidently read the source code of the program.

World of objects

Forget about Java, forget about C ++. Forget everything that you have been taught.

The phrase “in language X, everything is an object” is so ridiculous that I did not want to use it here. However, it is difficult to describe the full depth of this thought in relation to Smalltalk without resorting to such cliches.

Indeed, in Smalltalk, everything is an object. Strings, numbers (however, there is one useful exception), arrays - this is understandable. But each object has its own class. Which (surprise!) Is also an object. And yes, it also has its own class, which is also an object, etc. Class methods are also objects. Method bytecodes - well, you get the idea. Even pieces of code presented in the so-called language blocks are also objects, each of which you can talk to in a friendly way, and he will tell you everything he knows.

In the description of the basic image of LittleSmalltalk there is such a wonderful psychedelic place :
name         subclassOf    instanceOf
Object       MetaObject    nil
Class        MetaClass     Object
MetaObject   Class         Class
MetaClass    Class         MetaObject

It tells us that:
  • the Object class is a subclass of MetaObject and an instance of nil (objects came from nonexistence)
  • the Class class is a subclass of MetaClass and an instance of Object (all classes are also objects)
  • class MetaObject is a subclass of Class and its instance (uh ...)
  • MetaClass is a subclass of Class; an instance of MetaObject (metaclasses are also classes)
  • Note to Experts: Little Smalltalk does not have a Behavior class.

Brain explosion? Yeah. But this is the only place in the entire class hierarchy that looks contradictory. It is contradictory. But at the cost of this little insanity great opportunities are achieved.

I brought this place not to frighten the reader, but to demonstrate that the classes and objects in Smalltalk, like the Taoist symbol "Yin and Yang", are interpenetrating entities.

Returning to the question of simplicity, I note that this inconsistency does not interfere with programming at all, and in 99% of cases the programmer does not think about it (or does not know). In the remaining percentage of cases, it allows you to do things that greatly simplify the programming and reading of texts of programs in this language.


Like real living things, objects are born and die. They live in an image - a computer memory area in which objects of a virtual machine are stored. This image can be saved to disk and downloaded later. Booting from the disk, we get exactly the same view as it was at the time of recording. This applies to all objects without exception, from numbers and strings to user interface elements. It is not required to take any special measures to preserve the state of the system - this provides an image. In this sense, the user interfaces of Smalltalk programs, I think, would please Jeff Ruskin in terms of his persistence.

For example: the user closed the text editor, and then returned to it after a while. Booting from the image, he discovers that he has at his disposal a system exactly in the state in which he left it. The cursor position, selected areas of the text, the contents of the clipboard will be restored to its original form. This is fundamentally different from the regular program launch model. This concept is continued in modern desktop environments, however, from my point of view, this is a pale likeness of what could have been.

Message concept

Smalltalk programming is completely reduced to communicating with objects in the image. There is no traditional editing of the source sheet here. Rather, it is successfully replaced by work in the built-in IDE. But the basis is still the interaction of some objects with others.

For example, to create a new class, we send a message to his ancestor asking him to create an heir. To create a new method, we ask the class to create a method, and then fill it with meaning. In fact, we spawn a method object and then add it to the list of methods of the class. There are no backstage intrigues, operations in the native code, or other arrangements. Everything is exclusively within the protocol.

So, sending a message- This is the only operation of the language that allows objects to interact with each other (and even with themselves). Moreover, this is actually the only complex operation that a virtual machine can do. All other instructions are used to provide the main task.

However, the easiest way to see this is with examples (how to run it can be found in the first article ):
->2 + 3

Here we took the object 2and sent it a message +with the parameter 3. The result of the message is a sum object that has been returned outside and displayed by the shell. This is an example of a binary message in which two objects are involved.

And here is an example of a unary message . Let's ask the object 2which class corresponds to it. This is done by sending a message class:
->2 class
Excellent. The two turned out to be an object of the class SmallInt.

What else can class instances SmallIntdo? Let's ask:
->SmallInt listMethods

Yeah. The known + operator and a bunch of arithmetic operations. To get this information, we sent a message to the listMethodsclass SmallInt. This is made possible because the class is SmallIntalso an object to which messages can also be sent. And all thanks to the above-described "psychedelic" tricks with inheritance. It is important to note that sending messages to classes and objects is implemented in the same way, that is, it is the same mechanism (without crutches). Classes and objects really coexist nearby and in no way interfere with each other.

What are some objects?

There are: ordinary objects (which are the instance of a certain class), the classes themselves, metaclasses. Metaclasses are such objects that ordinary classes are their instances. Tricky, but at first you can not pay any attention to it.

And there is true , false and nil . The first two are the only instances of classes Trueand, Falserespectively. That is, in the whole image there is only one true object . All places where it is supposed to return or store a Boolean value explicitly or implicitly use these objects.

Now let's talk about nil. As you might have guessed, this object is an empty, not initialized (or erroneous) value. That's just, unlike the C ++ null pointer and nullfrom the Java world, nil is a full-fledged object.

Let's check:
-> 1 isNil
-> nil isNil
-> nil class

As we can see, sending messages to this object is no different from others, which, it seems to me, is very convenient.


Another important type of object is symbols . A symbol in Smalltalk is an object that resembles a string in its properties, but, like nil , true and false , is present in the image in a single instance.

This is how ordinary lines behave:
->'hello' = 'hello'
->'hello' == 'hello'
-> 'hello' + 'World' = 'helloWorld'
-> 'hello' + 'World' == 'helloWorld'

Here, the operator is =used to formally compare the values ​​of two strings, while the operator ==checks the objects for identity . The operator ==will return true only if the object and the passed parameter are the same object. In the case described above, this is not so, since two instances of the class are checked for identity String, which are created one after another, but are not the same object.

And here's what happens with the characters:
-> #helloWorld = #helloWorld
-> #helloWorld == #helloWorld
-> ('hello' + 'World') asSymbol == #helloWorld

Smalltalk controls the creation of characters and ensures that they do not lose their uniqueness. Symbols are usually used as various identifiers, keys in collections, as well as method selectors (see below).

Cascading Messages

So far, we have operated with a single object, sending him messages and observing the result. But the result is also an object. So you can send a message to him.

We try:
-> Array parent
-> Object parent
-> Array parent isNil
-> Object parent isNil

In this example, we first print the ancestors of the Array and Object classes, and then send the result an isNil message to check for the value. The Object class is the top of the hierarchy of regular classes, so it returns nil in response to the parent message . As we can see, to combine several messages, it is enough to write them with a space. And such queues can be of any length:

-> 12
-> 12 class
-> 12 class methods
Dictionary (* -> Method, + -> Method, - -> Method, / -> Method, < -> Method, = -> Method, asInteger -> Method, asSmallInt -> Method, bitAnd: -> Method, bitOr: -> Method, bitShift: -> Method, hash -> Method, quo: -> Method, rem: -> Method, truncSmallInt -> Method)
-> 12 class methods keys
OrderedArray (* + - / < = asInteger asSmallInt bitAnd: bitOr: bitShift: hash quo: rem: truncSmallInt)
-> 12 class methods keys size

Another type of aggregation of message packages is cascading. In this case, a series of messages is sent to the same object without the need to indicate the name of the destination object each time. In this case, messages are written one after another through a semicolon. At the end of the whole sentence, a period ends.

Note: cascading messages in Little Smalltalk now does not work as it does in standard implementations. Why this is happening remains to be seen.

Key messages

At the moment, we already know two types of messages: unary and binary. There are still key messages that can take one or more parameters. Let's take a dictionary of SmallInt class methods from the previous example and ask which key is under index 7:

-> SmallInt methods keys at: 7

Here we send the keysmessage #at:with parameter 7 to the object . The indices in Smalltalk are counted from 1, so the first element has index 1 and the last is equal to the size of the container.

Here is another example of a key message:
-> (Array new: 5) at: 1 put: 42
Array (42 nil nil nil nil)

First, we created an array by sending a message #new:to an Array object with parameter 5, indicating the number of elements. Then we placed the value 42 at index 1 in the newly created array. The resulting array was displayed on the screen. Note that the remaining 4 cells are filled with nil values .

A remarkable feature of key messages is that the line at: 1 put: 42is one parameterized message #at:put:, and not two, as one might think. In the style of C-like languages, this could be written keys->atPut(1, 42)in kind, however, in such a record, the correspondence of the transferred parameters and their purpose is lost.

Let's say we have a certain classRectanglerepresenting a rectangle on some plane. In C ++ code, we came across these lines:
Rectangle* rect1 = new Rectangle(200, 100);
Rectangle* rect2 = new Rectangle(200, 100, 115, 120, 45);

How to understand what numbers correspond to what? Say, in the first case, our experience will tell us that most likely we are talking about the size of the rectangle, and that the first parameter corresponds to the size in X, and the second in Y. But to find out for sure, we need to look into the prototype of the class constructor Rectangle. The second option looks even less readable. Of course, a good programmer would add comments to the code, and change the function prototype so that it accepts "talking" types, sort of Point, but this is not about that.

Let's see what a similar Smalltalk construct might look like:
rect1 <- Rectangle width: 200 height: 100.
rect2 <- Rectangle new
             width: 200;
             height: 100;
             positionX: 115;
             positionY: 120;
             rotationDegrees: 45.

In the first case, we sent a message to the #width:height:class Rectanglethat created the instance itself and set the value of the corresponding fields from its parameters. In the second case, we created the instance in the usual way, sending a message #new, and then used the cascading of messages to set values ​​one by one. Pay attention to how visual the code becomes. We don’t even need to add comments so that the reader understands what is happening.

In principle, this code could be written “forehead”, however it looks less beautiful:

"создаем инстанции"
rect1 <- Rectangle width: 200 height: 100.
rect2 <- Rectangle new.
"устанавливаем значения"
rect2 width: 200.
rect2 height: 100.
rect2 positionX: 115.
rect2 positionY: 120.
rect2 rotationDegrees: 45.

The ability to interleave portions of the message selector with the transmitted parameters, it seems to me, is one of the strengths of the Smalltalk language. With the proper use of variable names and selectors, this allows you to write very understandable methods that practically do not require commenting.

Take a look at the following code example. This is a Dictionary class code that responds to a unary message.#keysAsArray:
keysAsArray | index result |
    result <- Array new: keys size.
    1 to: keys size do: [ :index | result at: index put: (keys at: index) ].
    ^ result

In the body of this method, we first create an array of return values, and then fill it with the contents of the field keys. Here, a message #to:do:with two parameters is transmitted to the unit . The first is this keys size, and the second is a piece of code that needs to be executed (expression in square brackets). In Smalltalk, such pieces of code are called blocks . Of course, they are objects and can be stored in a variable. Here the variable for the block is not created, but it is transferred immediately to the place of use. In order to execute a block, it needs to be sent a #value message, or #value: if it accepts a parameter. This is exactly what the class will do SmallIntin implementing its method #to:do:.

In our case, the block will be calledsizetimes, each time an iteration number will be passed to it, which will be interpreted as an index for selecting values ​​from keysand adding them to result.


At this point, we suddenly realize that we already know 90% of the entire Smalltalk syntax. It remains only to comment on certain points and explain the purpose of certain parts. In order not to be completely bored, we will do it on a real code example. I impudently borrowed it from the source image of the primary image. First, I will cite the entire text, and then go through the parts and comment on the purpose of individual lines.

METHOD Collection
    ^ self sort: [ :x :y | x < y ]
METHOD Collection
sort: criteria | left right mediane |
    (self isEmpty) ifTrue: [^self].
    mediane <- self popFirst.
    left  <- List new.
    right <- List new.
    self do: [ :x |
        (criteria value: x value: mediane)
            ifTrue:  [ left  add: x ]
            ifFalse: [ right add: x ] ].
    left  <- left  sort: criteria.
    right <- right sort: criteria.
    right add: mediane.
    ^ left appendList: right

The class Collectionrepresents some abstract collection of elements. CollectionHe doesn’t know how to store data, he only provides general algorithms for handling them. One such algorithm is sorting.

So, in the routine:
METHOD Collection
    ^ self sort: [ :x :y | x < y ]

Here we declare a default sorting method that takes no parameters and calls its partner - the #sort method: which takes as a parameter a block that compares two elements of the collection based on some criterion. We have a default criterion: more-less ratio. Although, for complex elements of the collection, no one forbids calling additional messages, like x someField < y someField.

The record [ :x :y |describes the formal parameters of the block, the symbol ^is the equivalent returnfrom the C world. The keyword is selfused to send a message to oneself, superto send to one's ancestor.

Move on:
sort: criteria | left right mediane |
    (self isEmpty) ifTrue: [^self].
    mediane <- self popFirst.

The #sort: method is declared here with one formal parameter criteria. Next are local variables, separated from the rest of the text by vertical bars. By style, it is allowed to write them on the same line, although you can transfer them to the following:
sort: criteria
    | left right mediane |
    (self isEmpty) ifTrue: [^self].
    mediane <- self popFirst.

Then we check the recursion base. In the case of an empty collection, the result of the sort will also be an empty collection.

In the common sense, there is no syntax in the Smalltalk language. There are individual messages and keywords that have a special meaning, but there are no hard-coded rules. Therefore, there are no conditional statements. Instead, it successfully applies what objects can do so well - to exchange messages.

The design is (self isEmpty) ifTrue: [^self]fundamentally no different from any other similar to it. Parentheses are optional here and are inserted solely for decorative purposes. First, we send a message to ourselves #isEmpty, and then we send a message to the result of this action (one of the instances of the class Boolean)#ifTrue:with the parameter of the block to be executed in case of truth.

The last line, we bind the local variable mediane to an object that returns the current object in response to the message #popFirst. I intentionally used the verb “bind” instead of “assign” in order to emphasize that no copying is taking place here. All variables store only references to objects, not values. Collections also store links, so we don’t have to worry about copying large amounts of data. For the explicit creation of a copy of the object, separate messages are provided for full or superficial (non-recursive) copying.

The next part of the code, sorting itself:
    left  <- List new.
    right <- List new.
    self do: [ :x |
        (criteria value: x value: mediane)
            ifTrue:  [ left  add: x ]
            ifFalse: [ right add: x ] ].

We create a pair of lists to store items that satisfy and do not meet the sorting criteria. Conventionally, we call them “left” and “right” halves. Then we go through the contents of the current collection (method #do:), for each element ( x) we call the comparison block with the median and arrange the elements in lists, based on the comparison result.

Please note that the blocks, being actually separate objects, calmly access the above declared variables. So, the block inside #do:uses the variable mediane, while the block when #ifTrue:refers to both the variable xand the leftone declared even higher in the hierarchy. This is made possible because the blocks in Smalltalk are closures.and tied to the lexical context of their use.

Finally, the rest of the method:
    left  <- left  sort: criteria.
    right <- right sort: criteria.
    right add: mediane.
    ^ left appendList: right

We recursively sort the received parts, and then combine the sorted left, median and sorted right parts into one list, which we return as the result of sorting.

Now let's see how sorting can be used:
"простая сортировка"
-> #(13 0 -6 221 64 7 -273 42 1024) sort
Array (-273 -6 0 7 13 42 64 221 1024)
"сортировка по убыванию"
-> #(13 0 -6 221 64 7 -273 42 1024) sort: [ :x :y | x > y ]
Array (1024 221 64 42 13 7 0 -6 -273)
"лексикографическая сортировка"
-> #(13 0 -6 221 64 7 -273 42 1024) sort: [ :x :y | x asString < y asString ]
Array (-273 -6 0 1024 13 221 42 64 7)
"сортировка по длине строчного представления"
-> #(13 0 -6 221 64 7 -273 42 1024) sort: [ :x :y | x asString size < y asString size ]
Array (7 0 13 -6 42 64 221 1024 -273)
"разбиение строки на слова"
->'а вообще, мне очень нравится этот язык!' words
List (а вообще, мне очень нравится этот язык!)
"разбиение строки на слова и сортировка по длине слова"
->'а вообще, мне очень нравится этот язык!' words sort: [ :x :y | x size < y size ]
List (а мне этот язык! очень вообще, нравится)

Thus, a single generalized algorithm can be successfully used to process any type of data. It is enough to set the correct criterion.


A couple of words need to be said about the List container, which is actually a descendant of Collection and stores data in the form of a unidirectional list represented by a chain of objects. In this case, the head of the list is stored in his field, and then each next element (instance of the Link class) refers to the stored object and the next element of the list.

Therefore, the fastest way to add data to the head of the list is:
add: anElement
    elements <- Link value: anElement next: elements.
    ^ anElement

We simply add a new object to the elements field, associating it with the current value and with the new element that needs to be stored.

Adding an element to the tail of the list is much more complicated, since we need to get there first:
addLast: anElement
    elements isNil
        ifTrue:  [ self add: anElement ]
        ifFalse: [ elements addLast: anElement ].
    ^ anElement
addLast: anElement
    next notNil
        ifTrue:  [ ^ next addLast: anElement ]
        ifFalse: [ next <- Link value: anElement ]

In this sense, Smalltalk lists resemble the corresponding structures in the Haskell language (of course, minus the requirement of homogeneity of the list in Haskell). In this case, the operator :matches #add :, and the operator ++matches #addLast:.

Finally, the #appendList: method quickly bundles two lists into one. He finds the end of the first list and sticks the second to it:
appendList: aList | element |
    (elements isNil)
        ifTrue: [ elements <- aList firstLink. ^self ].
    element <- elements.
    [element next isNil]
        whileFalse: [element <- element next].
    element next: aList firstLink.

This is faster than iteratively adding items one by one.


This time we tried to look into the world of Smalltalk. I hope that I was able to demonstrate the main points of working with this language without losing the meaning and sense of simplicity along the way. Now we have a basis for subsequent articles, which will already touch on lower-level things. Thanks for attention!

To be continued :)

PS: Habrachelove sheknitrtch made a patch that allows you to compile llst on MSVC10, for which many thanks to him. If someone wants to make a build for posting in downloads - please knock on the PM.

Also popular now: