Halt December 28, 2012 at 12:37

LLST: New Life Little Smalltalk

Hello! With the end of the world and the upcoming holidays :)
As a gift to the Open Source community, as well as lovers of antiques, we (together with comrade humbug ) decided to post our latest research development.

We bring to your attention from scratch a rewritten in C ++ implementation of a virtual machine compatible with Little Smalltalk . At the moment, the virtual machine code is written and the basic primitives are implemented. Humbug wrote a series of simple tests, which nonetheless helped to detect problems in the original version of VM. The implementation is binary compatible with the images of the original LST of the fifth version.

A month of work, 300+ commits. And what happened in the end, you can find out under the cut.

But why?

I always liked Smalltalk. Its clinical simplicity (forgive me, lispers and forters) and no less clinically wide possibilities. I believe that it is undeservedly forgotten by the community of programmers, although in the 21st century a lot of benefit can be drawn from it. However, existing industrial implementations are too cumbersome for the first acquaintance and do not shine with the beauty of their forms. Meet, as you know, by clothes. A newcomer who first saw such an interface is unlikely to treat it as something modern or innovative.

Little Smalltalk is compact enough to figure it out in a couple of hours. At the same time, it is a full-fledged Smalltalk, although it is not compatible with the Smalltalk-80 or ANSI-92 standards. From my point of view, the competent implementation of such a microsystem could be a good help in the process of training students of technical universities. It is especially useful in the study of OOP, since the concepts of encapsulation, polymorphism, and inheritance here acquire a very clear and at the same time obvious expression. Many of my friends were confused in these concepts or did not understand their original meaning. Having such a tool on hand, in 10 minutes you can literally show "on the fingers" the advantages of OOP and the mechanisms of its operation. Moreover, unlike other languages, these principles do not look "far-fetched" because they constitute the actual core of the language.

In the end, it’s rather funny to have something written, in fact, on its own and at 100 KB, which fits a virtual machine, a compiler and a standard library with the full code of all its methods.

However, I started talking, and the post is not quite about that. Let's talk better about the project and its goals. So,

Goal # 1 New VM (Achieved).

Rewrite Little Smalltalk code in C ++, eliminate the design flaws of the original, comment out the code, make it readable and easily modifiable.

Unfortunately, the original code was written either by ~~Indian~~ students, or by someone else. From my point of view, it is unacceptable to have such source codes for the educational project (which was exactly how Small Smalltalk was positioned by the author). Switch blocks on a thousand lines sprinkled with goto and macros, reusing the same variable in five different places for different purposes ... all in all, fun. Plus, for the whole code, one and a half comments are in the Landavshits style, of the form: “it obviously follows from this ...”

Of course, it was impossible to live like that. Therefore, the code was analyzed, and, in an attempt to understand the Great Design, the current implementation appeared. A convenient type system, templates for containers, and template pointers to heap objects were developed so that you would not have to think about the collector every time you create an object. Now it is possible from C ++ to work with virtual machine objects as easily as with regular structures. All work with memory, calculating the sizes of objects and their proper initialization now fall on the shoulders of the compiler.

As an example, I will give the implementation code for opcode number 12 “PushBlock”.

So it was (formatting and comments of the author saved):

        case PushBlock:
        DBG0("PushBlock");
        /*
           create a block object 
         */
        /*
           low is arg location 
         */
        /*
           next byte is goto value 
         */
        high = VAL;
        bytePointer += VALSIZE;
        rootStack[rootTop++] = context;
        op = rootStack[rootTop++] =
          gcalloc(x = integerValue(method->data[stackSizeInMethod]));
        op->class = ArrayClass;
        memoryClear(bytePtr(op), x * BytesPerWord);
        returnedValue = gcalloc(blockSize);
        returnedValue->class = BlockClass;
        returnedValue->data[bytePointerInContext] =
          returnedValue->data[stackTopInBlock] =
          returnedValue->data[previousContextInBlock] = NULL;
        returnedValue->data[bytePointerInBlock] = newInteger(bytePointer);
        returnedValue->data[argumentLocationInBlock] = newInteger(low);
        returnedValue->data[stackInBlock] = rootStack[--rootTop];
        context = rootStack[--rootTop];
        if(CLASS(context) == BlockClass)
        {
          returnedValue->data[creatingContextInBlock] =
            context->data[creatingContextInBlock];
        }
        else
        {
          returnedValue->data[creatingContextInBlock] = context;
        }
        method = returnedValue->data[methodInBlock] =
          context->data[methodInBlock];
        arguments = returnedValue->data[argumentsInBlock] =
          context->data[argumentsInBlock];
        temporaries = returnedValue->data[temporariesInBlock] =
          context->data[temporariesInBlock];
        stack = context->data[stackInContext];
        bp = bytePtr(method->data[byteCodesInMethod]);
        stack->data[stackTop++] = returnedValue;
        /*
           zero these out just in case GC occurred 
         */
        literals = instanceVariables = 0;
        bytePointer = high;
        break;

And so it became:

void SmalltalkVM::doPushBlock(TVMExecutionContext& ec) 
{
    hptr  byteCodes = newPointer(ec.currentContext->method->byteCodes);
    hptr stack     = newPointer(ec.currentContext->stack);
    // Block objects are usually inlined in the wrapping method code
    // pushBlock operation creates a block object initialized
    // with the proper bytecode, stack, arguments and the wrapping context.
    // Blocks are not executed directly. Instead they should be invoked
    // by sending them a 'value' method. Thus, all we need to do here is initialize 
    // the block object and then skip the block body by incrementing the bytePointer
    // to the block's bytecode' size. After that bytePointer will point to the place 
    // right after the block's body. There we'll probably find the actual invoking code
    // such as sendMessage to a receiver (with our block as a parameter) or something similar.
    // Reading new byte pointer that points to the code right after the inline block
    uint16_t newBytePointer = byteCodes[ec.bytePointer] | (byteCodes[ec.bytePointer+1] << 8);
    // Skipping the newBytePointer's data
    ec.bytePointer += 2;
    // Creating block object
    hptr newBlock = newObject();
    // Allocating block's stack
    uint32_t stackSize = getIntegerValue(ec.currentContext->method->stackSize);
    newBlock->stack    = newObject(stackSize, false);
    newBlock->argumentLocation = newInteger(ec.instruction.low);
    newBlock->blockBytePointer = newInteger(ec.bytePointer);
    // Assigning creatingContext depending on the hierarchy
    // Nested blocks inherit the outer creating context
    if (ec.currentContext->getClass() == globals.blockClass)
        newBlock->creatingContext = ec.currentContext.cast()->creatingContext;
    else
        newBlock->creatingContext = ec.currentContext;
    // Inheriting the context objects
    newBlock->method      = ec.currentContext->method;
    newBlock->arguments   = ec.currentContext->arguments;
    newBlock->temporaries = ec.currentContext->temporaries;
    // Setting the execution point to a place right after the inlined block,
    // leaving the block object on top of the stack:
    ec.bytePointer = newBytePointer;
    stack[ec.stackTop++] = newBlock;
}

And this is the situation with almost all the code. Readability, it seems to me, has improved, though at the cost of a slight drop in performance. However, normal profiling has not yet been performed, so there is room for creativity. Plus, there are lst forks on the network that are claimed to have more performance.

Goal number 2. Integration with LLVM.

Some developers believe that JIT for Smalltalk is unproductive due to the high granularity of its methods. However, this is usually a “literal” translation of virtual machine instructions into JIT code.

LLVM, on the contrary, besides JIT itself, provides ample opportunities for code optimization. Thus, the main task is to “explain” LLVM what can be optimized and how best to do it.

I was wondering how successfully LLVM can be used in such a "hostile" environment (a large number of small methods, super-late binding, etc.). This is the next major task that will be solved in the near future. The humbug experience with LLVM is useful here .

Goal number 3. Use as a control system in embedded devices.

As I wrote above, this development is not completely research. One of the real places of application of our VM may be the smart home system control module, which I am developing together with another haberman ( droot ).

Using Smalltalk in embedded systems is not out of the ordinary. On the contrary, history knows examples of its rather successful application. For example, Tektronix TDS 500 series osprey oscilloscope ~~graphs~~ have a graphical interface implemented on the basis of Smalltalk (the picture is clickable).

This device has an MC68020 + DSP processor on board. The control code is written in Smalltalk, critical sections in assembler. The image consists of approximately 250 classes and is entirely hosted in ROM. Less than 64 KB of DRAM is required for operation.

In general, in terms of the possibilities of use, there is a presentation where many points are described. Caution! Twisting design and Comic Sans MS.

Goal number 4. Try to imagine what a Smalltalk “with a human face” might be.

Alan Kay , who worked in the Xerox PARC lab in the 80s, developed Smalltalk. He laid the foundations for what we now call the graphical user interface. Moreover, the first application of this interface was just in the Smalltalk IDE. Actually for him it was created. Subsequently, these developments were used in the projects of Lisa and Machintosh by another nimble fellow, whom many now call the “father of the GUI” and PC in addition.

Harsh VisualAge is harsh (clickable) Classic Smalltalk has always been distinguished by the severity of appearance and square-nested arrangement of elements. The severity of the interface, competing with the Motif library, has never added appeal.

Today, customers are accustomed to the “wet floor” and gradients, so only nerds in “professorial” glasses with a tortoiseshell frame can freely use Smalltalk to solve problems. As a means of developing modern applications, it is not very suitable. Of course, if only the customer himself is not a fan of such systems, which is unlikely.

Dolphin

Almost the only Squeak, Pharo, and other Visual Age that stands out from the slender ranks is Dolphin Smalltalk , which initially focused on tight integration with the OS.

Unfortunately, it is paid, only under Windows, and the community version is castrated by rusty scissors to the least. After doing a number of tasks from the documentation (good, by the way), there is absolutely nothing to do. Writing your classes, and more. Community version does not provide normal user interface creation capabilities. As a result, we have fast native widgets, transparent WinAPI calls and zero portability. Excellent development, which they do not want to release into the wild from the abyss of financial occupation.

As part of the LLST project, I want to do the integration of the Qt library, as well as experiment in terms of the user interface. Subsequently, the library can be ported to industrial Smalltalk.

Where to get the source and what to do with them?

Since you have read to this place (which in itself is amazing!), You probably want to get the source. I have them! The main working repository is currently located on the ~~Bitbucket~~ Github at github.com/0x7CFE/llst (the llst.org domain also leads there )

Note 1: Due to its specificity, the code is collected in 32-bit mode. Therefore, to build and run on x64, you need 32 bit libraries ( ia32-libsin the case of Ubuntu), as well as a library g++-multilib.

sudo apt-get install ia32-libs g++-multilib

Note 2: Anyone who does not want to suffer from compilation can download a ready-made statically assembled package on the release page .

UPD: It’s better to read the new build rules in the Usage section of the repository’s main page (remember to read the LLVM section ).

Collect like this:

~ $ git clone https://github.com/0x7CFE/llst.git
~ $ cd llst
~/llst $ mkdir build && cd build
~/llst/build $ cmake ..
~/llst/build $ make llst

With the correct moon phase and personal luck, the llst executable is found in the build directory, which can be used for good.

For example, like this:

build$ ./llst

If all is well, then the output should be something like this:

many beeches

Image read complete. Loaded 4678 objects
Running CompareTest
equal (1) OK
equal (2) OK
greater (int int) OK
greater (int symbol) ERROR
true (class True): does not understand asSmallInt
VM: error trap on context 0xf728d8a4
Backtrace:
error: (True, String)
doesNotUnderstand: (True, Symbol)
= (SmallInt, True)
assertEq: withComment: (Block, True, String)
assertWithComment: (Block, String)
greater (CompareTest)
less (int int) OK
less (symbol int) OK
nilEqNil OK
nilIsNil OK
Running SmallIntTest
add OK
div OK
mul OK
negated (1) OK
negated (2) OK
negative (1) OK
negative (2) OK
quo (1) OK
quo (2) OK
sub OK
Running looptest
loopCount OK
sum OK
symbolStressTest OK
Running ClassTest
className (1) OK
className (2) OK
sendSuper OK
Running MethodLookupTest
newline (Char) OK
newline (String) OK
parentMethods (1) OK
parentMethods (2) OK
Running StringTest
asNumber OK
asSymbol OK
at (f) OK
at (o) OK
at (X) OK
at (b) OK
at (A) OK
at (r) OK
copy OK
indexOf OK
lowerCase OK
plus (operator +. 1) OK
plus (2) OK
plus (3) OK
plus (4) OK
plus (5) OK
plus (6) OK
plus (7) OK
plus (8) OK
plus (9) OK
reverse OK
size (1) OK
size (2) OK
size (3) OK
size (4) OK
Running ArrayTest
at (int) OK
at (char) OK
atPut OK
Running gctest
copy OK
Running ContextTest
backtrace (1) OK
backtrace (2) OK
instanceClass OK
Running PrimitiveTest
SmallIntAdd OK
SmallIntDiv OK
SmallIntEqual OK
SmallIntLess OK
SmallIntMod OK
SmallIntMul OK
SmallIntSub OK
bulkReplace OK
objectClass (SmallInt) OK
objectClass (Object) OK
objectSize (SmallInt) OK
objectSize (Char) OK
objectSize (Object) OK
objectsAreEqual (1) OK
objectsAreEqual (2) OK
smallIntBitAnd OK
smallIntBitOr OK
smallIntShiftLeft OK
smallIntShiftRight OK
->

The observed error refers to the image code, and is not a problem in the VM. The same behavior is observed when running a test image on the original lst5.

Next, you can play around with the image and talk to him:

-> 2 + 3
5
-> (2+3) class
SmallInt
-> (2+3) class parent
Number
-> Object class
MetaObject
-> Object class class
Class
-> 1 to: 10 do: [ :x | (x * 2) print. $ print ]
2 4 6 8 10 12 14 16 18 20 1

…etc. Methods are also useful listMethods, viewMethodand allMethods:

-> Collection viewMethod: #collect:
collect: transformBlock | newList |
        newList <- List new.
        self do: [:element | newList addLast: (transformBlock value: element)].
        ^ newList

Any class can be asked about the parent (through parent) and about the descendants:

-> Collection subclasses
Array
    ByteArray
    MyArray
    OrderedArray
    String
Dictionary
    MyDict
Interval
List
Set
    IdentitySet
Tree
Collection
->

You can complete the work by sending the combination Ctrl + D :

-> Exited normally
GC count: 717, average allocations per gc: 25963, microseconds spent in GC: 375509
9047029 messages sent, cache hits: 4553006, misses: 53201, hit ratio 98.85 %

In general, an image can tell a lot of interesting things about itself. Even more can be found in its sources, which are in the file llst / image / imageSource.st .

For easy reference, I wrote a syntax highlighting scheme for Katepart, which lies all in the same repository at llst / misc / smalltalk.xml . To make it work, you need to copy this file to the directory / usr / share / kde4 / apps / katepart / syntax / or to the analogue in ~ / .kde and restart the editor. Will work in all editors using Katepart: Kate, Kwrite, Krusader, KDevelop, etc.

Conclusion

I hope I did not bore you with lengthy thoughts on the topic of smoltock and its place in the programmer’s arsenal. I really want to hear feedback on the topic of the project as a whole and the readability of its source in particular.

The following article discusses the Smalltalk language itself and provides the basic concepts needed to successfully read the source. This will be followed by a series of articles where I will describe in more detail the internal structure of a virtual machine and concentrate on representing objects in memory. Finally, the final articles will most likely be devoted to the results of working with LLVM and Qt. Thanks for attention! :)

PS:At the moment, I am looking for a place for the paid application of my strength (work, that is). If you have interesting projects (especially of a similar plan), please knock in PM. I myself am in the Novosibirsk Academgorodok.

Tags: