mmans April 24, 2013 at 12:09

A radical approach to application development

From the sandbox

From a translator: In 2007, in search of a web engine, I came across a very interesting and unusual dialect of Lisp. And after reading several articles, I was fascinated by its principles. Since my main work is far from web programming, I don’t use it professionally, but from time to time I come back to it and a little “storm”.

For all the time of acquaintance with this language, he practically does not flash anywhere, and in Russian there is almost no information about it. Let's try to fill this gap. Despite the fact that the original article dates from the 2006th year, the topic is quite relevant.
Many thanks for the help in translating Nadezhda Zakharova and the wonderful site Notabenoid .

1. Introduction

I work as a consultant and developer of free software. For twenty years, my partners and I have worked on projects such as image processing, computer-aided design, modeling systems, as well as various financial and business applications.

For almost all of these projects, we used Lisp. My daily work is to listen to customer requests, analyze business processes and develop software according to their needs.

Usually - in business applications such as ERP or CRM - this is a process of constant change. At the beginning of a new project, neither the developer nor the customer knows exactly what is needed or what the final product should look like.

This comes during an iterative process (some call it “extreme programming”). The client evaluates each new version, then discusses the strategy for further development of the program. Often, unforeseen requirements force you to rewrite large parts of a project. This does not mean that the project was poorly planned, because the process that I describe is planning. In an ideal world, software development is just planning, the time spent writing code directly should go to zero.

We need a programming language that allows us to directly express what we want the program to do. We believe that the code should be as simple as possible so that any programmer can at any time understand what is happening in the program.

Over the years, the Pico Lisp system has evolved from a minimalist implementation of Lisp to a dedicated application server. Please note that we are not talking about a rapid prototyping tool. At each stage of development, the result is a fully functional program, not a prototype, which grows to a serial (possibly latest) version. On the contrary, it can be called a powerful tool of a professional programmer who is used to keeping his development environment in order and wants to express the logic of his application and data structure in a concise presentation.

First, we want to introduce you to Pico Lisp, explain why Pico at a low level is significantly different from other Lisp-s or development systems, and then show its advantages at higher levels.

2. The radical approach

Perhaps the (Common-) Lisp community will not be thrilled with Pico Lisp because it destroys some of the beliefs and dogmas that have become traditional. Some of them are just myths, but they can become the reason for the excessive complexity and slowness of Lisp. Practical experience with Pico Lisp proves that lightweight and fast Lisp is optimal for many types of effective application development.

2.1. Myth 1: Lisp requires a compiler

In fact, this is the most important myth. If you participated in Lisp group discussions, then you know that the compiler plays a major role. It may seem that it is almost synonymous with the runtime. It is important for people what actions the compiler performs with the code and how efficiently. If it seems to you that the Lisp program is running slowly, then you decide that you need a better compiler.

The idea of interpreted Lisp is regarded as an old delusion. Modern Lisp requires a compiler, and an interpreter is just a useful addition, and is needed for interactive debugging of an application. It is too slow and bloated to run programs that are already in a state of readiness for release.

We are sure that the opposite point of view is true. On the one hand (and not only from a philosophical point of view), a compiled Lisp program is no longer Lisp at all. This violates the fundamental rule of "formal equivalence of code and data." The resulting code does not contain S-expressions and cannot be processed by Lisp. The source language (Lisp) is converted to another language (machine code) with inevitable incompatibilities on different machines.

In practice, the compiler complicates the system as a whole. Features such as multi-linking strategies, typed variables, and macros have been developed to meet the needs of compilers. The system is bloated because it must also support the interpreter, and accordingly, two different architectures.

But is it worth the effort? Of course, the execution speed is higher, and creating a compiler is interesting for training purposes. But we argue that in everyday life, a well-designed “interpreter” can often surpass a compiled system.

You understand that in reality we are not talking about “interpretation”. The Lisp system immediately converts the data passed to it into internal pointer structures called S-expressions. True "interpretation" works with one-dimensional character codes, and this significantly slows down the execution process. Lisp "calculates" S-expressions quickly following these pointer structures. There are no searches, so nothing is really “interpreted”. But we will adhere to this familiar term.

A Lisp program as an S-expression forms a tree of executable nodes. The code for these nodes is usually written in optimized C or assembler, so the task of the interpreter is to transfer control from one node to another. Because many of these built-in Lisp functions are very powerful and do a lot of computation, most of the runtime is for nodes. The tree itself functions as a kind of glue.

The Lisp compiler removes a bit of this glue and replaces some nodes with primitive or stream functionality directly into machine code. But, since in any case most of the execution time falls on built-in functions, these improvements are not as dramatic as, for example, the Java bytecode compiler, for which each node (bytecode) has relatively primitive functionality.

Of course, compilation by itself also takes a lot of time. An application server often executes Lisp source files on the fly in one run and immediately discards the code as soon as it is executed. In such cases, either the initially slower interpreter of the Lisp-system based on the compiler, or the additional time spent by the compiler, will significantly reduce the overall performance.

The internal structures of Pico Lisp were originally designed for ease of interpretation. Although they were completely written in C, and were not specifically optimized for speed, there was never a problem of insufficient performance. The first commercial system written in Pico Lisp was a system for processing and retouching images, as well as for creating page layouts for printing. It was created in 1988 and was used on the Mac II with a 12 MHz CPU and 8 MB RAM.

Of course, there was no Lisp compiler, there were only low-level pixel manipulations and bezier functions written in C. Even then, when working on a computer that was hundreds of times slower than modern ones, no one complained about performance.

For the sake of interest, I installed CLisp and compared it with Pico Lisp using simple tests as an example. Of course, this does not mean that the test results show the usefulness of a particular system as an application server, but they give an approximate idea of the performance of these systems. At first, I tried to execute a simple recursive Fibonacci function.

(defun fibo (N)
   (if (< N 2)
      1 
      (+ (fibo (- N 1)) (fibo (- N 2)) ) ) )

When calling this function with parameter 30 (fibo 30), I got the following results (testing was performed on a Pentium-I 266 MHz laptop):

Pico (interpretation)	12 seconds
CLisp interpretation	37 seconds
CLisp compiled	7 seconds

The CLisp interpreter is almost three times slower, and the compiler is almost two times faster than Pico Lisp.

However, the Fibonacci function is not a good example of a typical Lisp program. It consists only of a primitive stream and arithmetic functions, which is easily optimized by the compiler and can be written directly in C if it is time critical (in this case, it would take only 0.2 s)

Therefore, I took another extreme case, with a function that performs extensive list processing:

(defun tst ()
   (mapcar
      (lambda (X) (cons (car X) (reverse (delete (car X) (cdr X))))) 
      '((a b c a b c) (b c d b c d) (c d e c d e) (d e f d e f)) ) )

Calling this function 1 million times, I got:

Pico (interpretation)	31 seconds
CLisp interpretation	196 seconds
CLisp compiled	80 seconds

The CLisp interpreter is now over 6 times slower, but to my surprise, even compiled code is 2.58 times slower than Pico Lisp.

Maybe CLisp has a slow compiler? And maybe the code can be accelerated with some tricks. But these results still leave a lot of doubt as to whether compilation overhead can be justified. Fiddling with optimizing compilers is the last thing I want to do when it comes to application logic, and when the user still doesn't notice the delays.

2.2. Myth 2: Lisp needs a lot of data types

The Fibonacci function described in the example above can be accelerated by declaring the variable N as an integer. But then this example will show how strongly compiler support requirements affect Lisp. The compiler can produce more efficient code if the data types are hardcoded. Common Lisp supports many different data types, including various integer types, fixed / floating point types, fractional numbers, characters, strings, structures, hash tables, and vector types in addition to lists.

Pico Lisp, on the other hand, only supports three built-in data types — numbers, characters, and lists — and it does just fine with those types. A Lisp system runs faster with fewer data types because fewer options need to be checked at runtime. Maybe this will entail less efficient use of memory, but a smaller number of types allows you to save space due to the fact that less bits are needed for tags.

The main reason for using all three data types is simplicity, and the advantage in simplicity outweighs the benefit of speed and memory compensation.

In fact, Pico Lisp at the lowest level uses only one data type, cells that are used to form numbers, characters, and lists. A small number or a minimum character occupies only one memory cell, dynamically increasing if necessary. This memory model allows efficient garbage collection and completely avoids fragmentation (as it would be, for example, with vectors).

At the highest level, you can always emulate other data types using these three primitive data types. So, we emulate trees using lists, strings, classes, and objects using symbols. While there are no performance issues, why complicate it?

2.3. Myth 3: Dynamic linking is bad

Pico Lisp uses a simple implementation of dynamic surface binding. The contents of the cell storing the value of the symbol are saved when entering the lambda expression or binding environment, and then a new value is set. Upon return, the original value is restored. As a result, the current value of the symbol is determined dynamically from the history and execution status, and not from static checks of the lexical environment.

Perhaps for the interpreted system this is the simplest and fastest strategy. To search for a cell value, no searches are required (only access to the cell value is required) and all characters (local or global) are processed the same way. On the other hand, the compiler can produce more efficient code for lexical binding, so compiled Lisp code usually complicates things by supporting several types of binding strategies.

Dynamic linking is a very powerful mechanism. You can access the current value from anywhere, the variable itself and its value are always physically existing “real things”, and not what they “seem” (as is the case with lexical binding, and to some extent using transit characters in Pico Lisp (see below)).

Unfortunately, great opportunities are impossible without big risks. The programmer must be familiar with the basic principles in order to take advantage of them and avoid pitfalls. However, as long as we stick to the conventions recommended by Pico Lisp, the risks will be minimal.

There are two types of situations where the results of calculations using dynamic linking can get out of control of the programmer:

the symbol is connected with itself, and we are trying to change the meaning of the symbol;
the problem of funarga (functional argument), when the value of a symbol is dynamically changed through code that is invisible in the environment of the current source code.

Such situations can be avoided by using transit symbols.

Transitional characters are characters in Pico Lisp that look like strings (and are often used as strings) and that are only temporarily interned for the duration of a single file with the source code (or only parts of it). Thus, they have lexical capabilities comparable to static identifiers in C programs, only their behavior is completely dynamic, because they are normal characters in all other respects.

So, the rules are simple: whenever a function must change the value of the variable passed to it or calculate the result of the passed Lisp expression (directly or indirectly), the parameters of this function must be written using transit characters. Practical experience shows that such cases are rare in high-level software development processes and occur mainly in auxiliary libraries and system tools.

2.4. Myth 4: Property Lists Are Bad

Properties is an elegant, understandable way to associate information with symbols in addition to a value / function cell. They are extremely flexible since the amount and type of data are not statically fixed.

Many seem to think that property lists are too outdated and primitive to be used in our time. Instead, more advanced data structures should be used. Although this is true in some cases, depending on the total number of properties in the symbol, the payback threshold may be higher than expected.

Previous versions of Pico Lisp experimented with hash tables and self-balancing binary trees to store properties, but we found that regular lists are more efficient. We must take into account the total effect of the entire system, and the overhead for both supporting a large number of internal data structures (see above) and more complex search algorithms is often greater than using a simple linear search. And when we also touch upon the issue of memory efficiency, the advantages of property lists are clearly winning.

Pico Lisp implements properties as a list of key-value pairs. The only concession in favor of speed optimization is the “most recently used” scheme, which speeds up repeated access a little, but we have no concrete signs that this was actually necessary.
Another argument against properties is their declared global visibility. This is true to the same extent as a global element in a C structure or an instance variable in a Java object.

Of course, in a global symbol, a property is also global, but in a typical application development, properties are stored in anonymous characters, objects, or database elements that are available only in a well-defined context. Therefore, the property “color” can be used in a certain sense in one context, and in a completely different sense in another context, without any mutual interference.

3. Application Server

Based on this simple Pico Lisp machine, we have developed a vertically structured application server. It unifies the database engine (based on PicoLisp's implementation of persistent (persistent) objects as a first-class data type) and an abstract graphical interface (generating, for example, HTML or Java applets).

A key element in this unified system is the Lisp-based markup language, which is used to implement individual application modules.

Whenever a new view from the database, a document or report, or some other service is requested from the application server, the Lisp source file is downloaded and executed on the fly. This is similar to a URL request followed by the submission of an HTML file in a traditional web server.

However, Lisp expressions evaluated in such a scenario usually have the side effect of building and processing an interactive user interface.

These Lisp expressions describe the structure of GUI components, their behavior in response to user actions and their interaction with database objects. In short, they contain a complete description of the software module. To make this possible, we found that it is important to strictly adhere to the Locality Principle and use the Prefix Classes and Link Support Demons mechanisms (the last two are described in another document ).

3.1. Locality Principle

As we said, the development of business applications is a process of constant change. The principle of Locality has been of great help in the development of such projects. This principle requires that all information relating to one module must be stored with this module in one place. This allows the programmer to focus on only one place where all this is stored.

Of course, all this seems quite obvious, but in contrast, software development methodologies require encapsulating behavior and data and hiding them from the rest of the application. Usually, this leads to the fact that the application logic is written in one place (the source file), but the functions, classes and methods for implementing this logic are defined elsewhere. Of course, this is a good recommendation, but it brings a lot of problems, manifested in the need to constantly move to different storage locations: modifications and context switching occur simultaneously in several places. If a function is deprecated, some modules may also expire, but we forget to delete them.

Thus, we believe that it is optimal to create an abstract library of functions, classes and methods - universal as constant as possible in time and various applications, and used to build a strict markup language with a high degree of expressiveness for creating applications.

This language should have compact syntax and allow describing all static and dynamic aspects of the application. Locally, in one place. Without the need to define behavior in separate files.

3.2. Lisp

And this is the very main reason why from the very beginning we argued that Lisp is the only language that suits us.

Only Lisp can handle code and data in the same way, and this is the basis of the Pico Lisp application development model. It allows intensive use of function blocks and calculated expressions that are freely mixed with static data and which can be either transferred somewhere or stored in internal data structures at runtime.

To our knowledge, in other languages this is not possible, at least with the same simplicity and elegance. To some extent this can be done using scripting languages, using interpreted lines of text, but this solution will be quite limited and awkward. And, as we described above, compiled Lisp systems can be too heavy and inflexible. In order for all of these data structures and code fragments to work seamlessly, a dynamic surface binding strategy is a big advantage, because expressions can be evaluated without the need to bind environment settings.

Another reason is that Lisp allows you to directly manipulate complex data structures, such as characters and nested lists, without the need to explicitly declare, allocate, initialize, or free these structures from memory. This contributes to the compactness and readability of the code and gives the programmer a powerful expression tool that allows you to perform different things on one line where other languages require a separate module.

In addition to this, since Pico Lisp does not make a formal distinction between database objects and internal characters, all these advantages also apply to working with the database, resulting in direct communication between GUI and database operations in the same local context using the same code.

4. Conclusion

The Lisp community seems to be suffering from the paranoia of the “inefficient” Lisp. This is probably due to the fact that for decades they have been forced to defend their language from claims that “Lisp is slow” and “Lisp is bloated”.

This was partly true. But on today's hardware, execution speed does not matter for many practical applications. And in cases where it has, encoding several critical functions in C usually solves this problem.

Now let's focus on more practical aspects. Some may be surprised how compact and fast the supposedly “ancient” Lisp system can be. Thus, we must be careful not to make Lisp really “bloated”, overloading the core of the language with more and more features, but should decide to use simple solutions that give full flexibility to the programmer.

Pico Lisp can be seen as proof of the concept of “Less can be more.”

Tags: