EFORTH for programmable calculator
This is the first article in the series about 161eForth v0.5b, ending here: habr.com/en/post/452572
The EFORTH translator is now also available on the domestic MK-161 Electronics calculator! On May 17, version v0.5b successfully passed my tests, as well as five TEST-TEST4 authoring tests. I have achieved what can be done alone, but I think this is only half the battle. It is time to introduce a new tool to the community by opening the 161eForth code for public testing. I have a list of what to improve and where to "work on stability." Your suggestions and comments will be taken into account when completing work and release version 1.0
When porting the latest version of eForth to the domestic platform, two obstacles were successfully overcome - the relatively low speed of the 8-bit machine, which is programmed in its own input language, and the modest amount of available binary memory (see 2.4.1), only 4096 bytes.
When writing 161eForth, ready-made solutions prepared for Callisto, the next-generation input language for domestic PMK, were used. This is a technology for implementing a fort machine on top of the decimal ALU and the “Harvard” architecture, console drivers and an alphanumeric keyboard layout, as well as a software terminal based on them, operating via the RS-232 serial port. In addition to Electronics MK-161 and the 161eForth distribution, you may need a home-made patch keyboard where the letters of the Russian and English alphabets are signed on the keys. The letters are arranged alphabetically line by line, from left to right and from top to bottom.
Dr. Chen-Hanson Ting, author of modern versions of eForth, emphasizes in his book [1] the importance of understanding the two components of the Fort. This is an internal ("address") interpreter that allows the equipment to execute Fort's sewn code, and an external ("text") interpreter responsible for dialogue with a person.
In two articles, I will dwell in detail on the most radical solutions used in the implementation of each of these two interpreters at Electronics. Learning these solutions can be useful and inspiring to migrate eForth to other devices with limited memory and performance. Understanding the articles will help with an initial introduction to programmable microcalculators (PMCs) and Fort. I will explain difficult moments unique to Electronics MK and the eForth translator.
To begin with, the words eForth are divided into general and systemic. The size of the letters matters. The names of ordinary words are defined in uppercase letters, and system - lowercase. I also made my innovations in eForth in lower case. The author of eForth suggests conducting the main dialogue in CAPS mode. When you need to use the system word, switch the time to lowercase letters (FP key combination).
In the article, all words are written in capital letters to stand out from the text. In several early eForth implementations, system word headers were excluded and not output by the WORDS command. This helped simplify the appearance of eForth and save the attention of those who use Fort for the first time. In 161eForth, the headings of these words were saved primarily due to the presence of the SEE colon word decompiler (see video No. 3 at the end of the article), which will not show the names of system words if their headings are removed.
To streamline the article and make it useful as a reference, I had to use several terms before defining them. Fort and PMK professionals should be familiar with these terms. Beginners sometimes have to look in the neighboring sections (I put the links in the right places) or re-read the article a couple of times.
161eForth itself is posted here, along with the source text, a graphic on-screen keyboard and help words.txt with a description of all implemented words: http://the-hacker.ru/2019/161eforth0.5b.zip
I also posted 5 small videos on YouTube, illustrating the operation of the 161eForth for those who do not have the MK-161. You can watch the entire playlist on YouTube . Below is the first of them, the remaining 4 at the end of the article.
eForth was designed as a modern replacement for the widely-known fig-Fort translator. To transfer to the MK-161, I chose a 32-bit version 5.2 of the 86eForth translator with indirect sewn code, written in 2016 on MASM assembler for the Windows operating system. This version is described in detail in the third edition of eForth and Zen [1]. Those who know English, I advise you to find and study this book, it is very useful for understanding 161eForth.
In a personal letter, the author confirmed that 86eForth502.asm from this book is the latest version of eForth. On the Internet you can find a lot of English-language information on this and on previous versions of eForth.
The development of eForth followed a scientific path taught by Professor Wirth using the example of his programming language Oberon. Each subsequent version of eForth was a simplification of the previous version. Everything that can be dispensed with was removed from the tongue. There remains a carefully thought-out set of strong, expressive language constructs, whose power has been tested on more than 40 eForth implementations for various platforms. Now on the calculator!
Being a minimalist dialect of Fort, eForth does not aim to win the race against the smallest Fort. The set of words he offers is quite practical and can easily be expanded by the programmer in the direction necessary for his tasks.
The first version of eForth was released in 1990 in MASM assembler for 8086 processors and worked under MS-DOS. It contained 31 machine-dependent core words and 191 high-level words. The idea was simple - you translate only 31 words into your assembler, and immediately get eForth on your computer.
This approach has been criticized on the Internet, as the way to minimize the number of words in assembler has led to extremely low performance for embedded systems. Already in the second version of eForth, the maximum number of words began to be implemented in assembler, which straightened the tilt towards not only an easily portable, but also a practical programming system.
For several years, Bill Munch, the original author of eForth, and his colleague Dr. Chen-Hanson Ting released their eForth releases in parallel. Each version had its own characteristics. The eForth options for different platforms have also been put in by other programmers.
Version 5.2, released in 2016, contains 71 words of “code” and 110 words of “colon”. A quarter century of searching for the ideal has led to a significant reduction in the total number of words. At the same time, for performance reasons, the percentage of words implemented at a low level increased.
The proposed 161eForth enjoys the generous benefits of this progress, but does not pretend to further develop the trunk line. My implementation provides the programmer with all the tools present in version 5.2. When the MK-161 architecture makes the implementation of some 86eForth words impossible or meaningless, instead of throwing away the excess, I give the programmers a complete replacement, taking it from the ANSI / ISO standard [4]. Those seeking minimalism can independently throw out “extra” words, because by tradition 161eForth comes with source code.
When implementing eForth, I adhered to the author's understanding. For example, in my opinion, a FOR NEXT loop with an initial value of n should execute exactly n times. The same conclusion eventually came Chuck Moore, the author of the languages Forth and colorForth. Unfortunately, eForth uses an outdated convention and executes such a cycle n + 1 times, with a counter from n to 0. I did not fix this and several other shortcomings, preferring 161eForth compatibility with implementations for other platforms.
Since 161eForth is the first practical on-board programming system for the MK-161 Electronics, with the exception of the factory language, I traced the long history of eForth and returned a few words to the language that were useful on other platforms and may be in demand now.
For example, the new-old variable 'BOOT contains the token (see 3.1) of the word, which is executed first after the environment is initialized, but before the dialogue starts. By default, 'BOOT contains a TLOAD token for interpreting code from the “text area” (see 2.4.2). This allows the programmer to customize eForth for himself without recompiling the environment, which is still impossible to produce on board the "Electronics".
The priority tasks of the implementation were saving binary memory (see 2.4.1) and improving performance. Their solution led to a dramatic decrease in the number of high-level words, because their code occupies this precious memory, due to an increase in the number of fast core words implemented in cheap program memory (see 2.4.3).
As a result, 161eForth contains 129 code words, 78 high-level words and occupies 1816 bytes of MK-161 binary memory, that is, less than half of it. This gives hope for the metacompilation of its high-level part directly on board the Electronics.
The source code for eForth MK-161 is divided into two large parts. The core written in the MK-161 command system is contained in the eForth0.mkl file. High-level words are defined in SP-Forth and placed in the eForth.f file.
The distribution also has a help file words.txt, which documents all 161eForth words with stack notation and a brief explanation, in one line.
The eForth core contains executable code operating in the memory of MK-161 programs (see 2.4.3), which is compiled on a computer into the eForth0.mkp file by standard means, for example, the proprietary MKL2MKP compiler.
The kernel source code contained in the eForth0.mkl file is written in Latin mnemonics . For example, an IPE command to read register E (aka R14) is written in this mnemonics as RME. Being unusual for the owners of Soviet PMK, Latin mnemonics are convenient for typing from a computer keyboard. Indeed, it’s easier to type weird FX ^ 2 than familiar from childhood Fx².
The eForth0.mkp file is a kernel preset. In addition to the code of primitives, it contains a kernel header and a table of names tblNames, which eForth.f transfers during decoding to decimal registers (see 2.4.4). It is on the basis of eForth0.mkp that the eForth.mkp core will be created (see 2.4.3), so eForth0.mkl must be compiled first.
The eForth.f file is fed to the input of the wonderful domestic compiler SP-Forth [5]. The file contains definitions of all high-level words. Over time, they can be identified on the eForth itself and possibly compiled directly on board the Electronics MK-161.
During compilation, eForth.f reads the core blank eForth0.mkp and with its help creates three files in the current directory for subsequent loading into MK-161: eForth.mkp, eForth.mkd and eForth.mkb. It is eForth.mkb that contains the bodies of high-level words, although their headers are located in the eForth.mkd file.
The fourth file, eForth.mkt, is manually written in eForth and can be edited onboard the MK-161 using the built-in text editor. Each of these four files I will analyze in more detail below (see 2.4).
A manufacturer from Novosibirsk calls the MK-161 an old acronym. That was the name of the very first calculators in the USSR. The MK-161 instruction system inherits the command system of the Soviet calculators "Electronics B3-34" and "Electronics MK-61." This means that programs written for Soviet calculators will go on the MK-161 without changes or with minor changes.
The converse is not true. eForth will not go to the Soviet PMK, because uses a lot of resources that first appeared in the MK-152/161 and were not available in previous models of the series.
Consider the features of the input language and architecture of the MK-161, which influenced 161eForth (hereinafter referred to simply as eForth) and gave the discussed implementation of eForth “Russian accent”.
The first of these features is sequentially maintained in MK-161senior-to-junior agreement . For example, the number 1000 = 3 × 256 + 232 will be written in two consecutive bytes, like 3 and 232.
Programming Soviet PMK heard about indirect addressing. For direct addressing, we explicitly indicate the register number we are referring to. For example, Р ИП 44 considers the contents of register 44. The Р key, which appeared in MK-152, is used to access registers with number 15 and more — these registers were absent in the Soviet PMK.
In indirect addressing, the number of the required register is not known in advance. This number is in a different register. For example, if register 8 contains the number 44, the command K PI 8 considers the contents of register 44 (R44).
The K and P keys can be combined. For example, the command R K BP 20 will transfer control (GOTO in Latin mnemonics) to the address stored in R20.
The feature that turned out to be important for the internal eForth interpreter is related to preliminary increase / decrease of registers during indirect addressing. This feature is inherited from the Soviet PMK.
For example, the indirect reading commands KI 0, KI 1, KI 2 and KI 3 reduce the contents of the registers 0, 1, 2 or 3 by one to the desired register. The commands KI 4, KI 5 and KI 6 before reading, the contents of registers 4, 5 or 6 are increased by one.
Such a “modification” of the address register allows you to process entire groups of registers in a cycle. It is similar to ++ R and --R in C. The register register number is important. It is he who determines whether it will increase (registers 4-6) or decrease (registers 0-3) with indirect addressing.
The 161eForth architecture was affected by the fact that the increase in registers 4-6 with indirect addressing is preliminary . As a result, the interpretation pointer (IP) located in R6 always points to the last byte of the sewn code. In 86eForth, IP always indicates a subsequent byte that has not yet been read.
This is also true for the return stack pointer (RP) stored in register 2. R2 always points to the top of the return stack.
A useful feature of the MK-161 is the absence of an increase / decrease in the register if indirect addressing occurs with the new key R. For example, RKIP02 counts the number from the top of the return stack without changing the pointer. This is a ready-made Fort R @ team. From the above it follows that the read value is one less than the address of the next token, which will be executed after returning from the word "colon".
When you have to develop or study words that closely interact with the eForth internal interpreter, be sure to fully understand this subtle point associated with exaggeration .
The MK-161 tables are located in the program memory (see 2.4.3). They appeared in the Novosibirsk "Electronics MK" and are completely unfamiliar to experts on Soviet PMK. The address of the table used is always stored in register 9042, but the access to them is different.
An ordered table is an array of unsigned 16-bit integers. eForth contains such a tblTokens table with the addresses of primitives (see 3.1.1) - Fort words written in the MK-161 command system. The address interpreter (see 3.2) uses tblTokens to quickly execute sewn code, so eForth tries to always contain the address of this table in R9042.
To access an ordered table, you need to write the number of the desired item in R9210. The number n in the register X will be replaced by the value of the table element with the number n, the count starts from zero.
Associative tables ("search by value") are actively used by eForth, primarily by the primitive (FIND), looking for a word by its name. Also, the tblCHPUT associative table is used when printing letters to the screen to process line feeds and other control codes.
To search for the element n in the associative table, write n to R9212. The number n in the register X (the management calls it the “index”) will be replaced by the 16-bit value recorded in the table immediately after its “index” n.
The presence of this quick, albeit simple, search function implemented in assembly language in the MK-161 “firmware” helped eForth achieve acceptable performance when recognizing word names and compiling programs. Of course, for this I had to develop not the simplest name recognition tables, “sharpened” for this function. We will talk about this in more detail in the second article.
“MK Electronics” allows its owners to write programs in the input language that respond to certain events - such as pressing or releasing a button, ending a timer count.
eForth actively uses this interrupt system for both keyboard input and a blinking cursor when prompted for such input, and for input / output via a universal serial port (RS-232).
The letters entered from the keyboard are queued bufKbd as you press the keys. This is very convenient and saves time on systems with low speed. Alphabet and case switching are handled by the KeyPress interrupt and do not take up queue space. A long press on the key calls auto-repeat.
When the line of 8 letters is full, and eForth is not yet ready to process the input (the situation is very rare), the MK-161 will issue an unhappy squeak. Of course, I would not like to implement all this natural work of the keyboard in the translator, but to get the MK-161 out of the box as a service of the built-in program (firmware). But what, as they say, is rich.
After the start of work, all eForth output is directed to the MK-161 graphic screen . The output of letters on it is carried out by a relatively simple routine of the CNCut. The only difficulty here is the implementation of the BS control code, the “space back”. MK-161 uses a proportional font. Therefore, in a special buffer tblBS you have to remember the positions of the displayed characters, from where the BS output code later takes them.
During the dialogue, the user can use the word IO> to redirect all the I / O to the RS-232 serial port, which makes it possible to program the MK-161 from a familiar computer keyboard or from another MK-161 . The word CON> returns control to the calculator console.
The “MK-161 Electronics” memory consists of separately addressable program memory and data register memory. In turn, the register memory is heterogeneous and is divided into three large areas.
Registers with numbers from 0 to 999 store "decimal numbers". These are ordinary registers, as in "Electronics B3-34" and other calculators. They are simply capable of storing not 8, but 12 decimal places of the “mantissa”.
Registers with numbers from 1000 to 8167 store integers from 0 to 255. The last 3 Kbytes of this area with addresses from 5096 to 8167 are called the text area .
Registers with numbers from 9000 to 9999 are called function registers. This service area of the address space resembles microprocessor I / O ports. With the help of write and read commands, these addresses are used to access I / O devices, interrupt systems, etc.
To install eForth on Electronics MK-161, it is enough to transfer four files to the calculator, for example, using the program of the manufacturer MK.EXE:
After transferring to the calculator, I recommend immediately saving these four files in a separate directory of the built-in “electronic disk”. Since they have the same name, you can download eForth right away at a time as a “package”.
The Electronics MK registers with numbers from 1000 to 5095 are used to store numbers from 0 to 255. This area of the register memory of the calculator is called binary. Two consecutive binary registers can be accessed from eForth as a single 16-bit “cell”, and (as everywhere on the MK-161), the upper 8 bits are in the register with a lower number.
eForth uses this tiny “binary memory” as its primary. Words work with her! and @, HERE and ALLOT, only from here the address interpreter executes the sewn code (see 3.2). Here are the eForth variables, the text input buffer (TIB), the dictionary, and the tblBS rollback stack to implement the backspace.
4096 bytes is very modest, by modern standards. Therefore, enormous efforts have been expended to bring to other areas of memory everything that is possible.
Immediately after the binary memory is a text area , registers with numbers from 5095 to 8167. Technically, these are the same byte registers, but the ability to write them to disk and read as a separate file makes this area special.
The word TLOAD is used to work with “text” in eForth. It feeds this entire area to the input of the text interpreter, as a string, 3072 letters long.
There is disagreement on how to break text into lines. An editor built into MK Electronics insists on a line length of 24 characters. Callisto uses the Fort convention, where the string contains 64 characters. eForth provides the user with the choice of counting all text as one long line. You can use the built-in editor MK-161. You can write your own, compatible with Callisto.
Here is the initial content of eForth.mkt, for convenience, divided into three lines:
The first line defines the new word hi that greets the user. The second line takes the token of this word (see 3.1) and places it in the variable 'BOOT (see 1). Now the text area will stop compiling every time eForth starts. Instead, the already compiled greeting will be executed.
The last line starts the word hi, displaying a greeting on the screen. The word \ completes the interpretation of the text, returning control to the console.
To compile an arbitrary text file, you need to go to the calculator with the BYE command, go to the main menu and load the desired file in DOS mode. You can also transfer the mkt file from a computer. The C / P key will return you to eForth, after which with the TLOAD command you can compile the file loaded into the text area.
Program memory MK-161 is an isolated address space. It also stores bytes, but they are read-only. Program memory contains 10,000 "steps", which turned out to be redundant for eForth. More than a quarter of the program memory turned out to be free, which gives a good reserve for the development of the translator.
Only in program memory can "code words" be implemented. Also, name recognition tables and all known text strings are rendered here, which saves binary memory.
Some words, such as C @, COUNT, and TYPE, can address program memory if the address is not a positive number. For example, the phrase 0 C @ counts as a “step” (byte) from address 0 of program memory.
The registers of MK Electronics with numbers from 0 to 999 are called decimal and contain numbers used for ordinary calculations on the calculator - 12 decimal digits of the “mantissa” and 2 decimal digits of the “order”. The fort is designed to work with integers up to 4 bytes long, such a resource is clearly redundant for eForth.
Decimal memory is used to save precious binary memory. Stacks of data and returns are made here. The headings of words are stored here - both user-defined and embedded, one register per title. This approach allows you to redefine even words with standard names.
Stack in decimal memory leads to a number of features characteristic of the Fort on the MK-161. Firstly, the range of values of the stack elements is huge; it can accommodate 32-bit integers. The need for "double integers" on the MK-161 disappears, although for the sake of compatibility I have implemented the corresponding words eForth. “Double integers” are presented on MK-161, as two stack elements containing numbers from 0 to 65535, encoding one 32-bit integer with a sign in the additional code. The high 16 bits of this number are placed on top, that is, at the lowest address.
The bitwise logical operations AND, OR, XOR, and NOT treat their arguments as 16-bit integers. A result from 32768 to 65535 is converted to negative numbers from -32768 to -1. In eForth, false is encoded with zero and truth minus one. Also true is any value other than zero.
The second feature of the 161eForth data stack is that it contains signed numbers. When the word @ reads the number 65535 from a 16-bit “cell”, it is automatically converted to -1. A special “unsigned” word U @ is provided in order to count directly 65535, with a plus sign.
I recall that for the sake of speed, the two upper elements of the data stack are not located in decimal memory, but directly in the X and Y registers .
The fact that decimal registers can contain fractional numbers and floating point numbers is not used by eForth. The eForth virtual machine uses these registers to store signed 12-bit decimal integers. Decimal registers are accessed by the words C @ and C! - the same ones that work with any single registers.
The eForth core is a program written in the MK-161 input language. Her first MAIN command transfers control to the MAIN code, which first of all finds out the circumstances of the reboot. If it was caused by the wrong token, MK-161 will squeak. At the first start-up, and also after turning on the MK-161, the screen is cleared. Next, MAIN calls the Init subroutine to initialize the interrupt system and everything that the MK-161 console drivers need.
After initializing the data stacks and returns, the low-level part of the start is complete. Incredible things happen for machines with Harvard architecture - eForth goes on to execute “wired code” from byte memory. The honor of being the first belongs to a word whose header address is recorded in R43. This is usually the word COLD.
How are high-level words arranged ?(VCA)? Any word consists of two parts, a body and a heading. The header is stored in decimal. It helps the external interpreter and decompiler find the name and body of the word. The heading also contains a “lexicon” field - a set of flags that help the external interpreter correctly process the word found. The internal interpreter is much more important to the VCA body located in binary memory and stored in the dictionary. He is able to even execute words that have no heading.
The body of the VCA starts with the byte of the code field that contains the address of the handlergiven word. Four VCA handlers are written in the MK-161 input language and begin on the first page of program memory. We will analyze them all (see 3.3), but the main one is called DOLST and is located at address 02, immediately after the command MAIN BP already considered. This handler executes Fort words defined with a colon.
After the byte of the code field is a parameter field of arbitrary length. In the “colon words”, the parameter field contains a “sewn code” - a sequence of 16-bit tokens, each of which indicates one action assigned to it.
First, we will consider the token in more detail. Then we will study the INEXT internal interpreter, which transfers from one token to the execution of the next. EForth calls INEXT a primitive handler. We conclude this tour of the internal interpreter by analyzing all four IED processors.
The token represents the word in the sewn code and stack, allowing it to be executed quickly. The token is a pointer to the body of the word, but the harsh architecture of the MK-161 made its own adjustments to this simple idea. Let's analyze all types of tokens, starting with the primitive token.
All words included in the eForth distribution are numbered from 0 to 206. This numbering is end-to-end, taking into account both primitives and VCA. This is done so that by the number of the word it was easy to restore his name . These names are stored in program memory. The link to the desired name is easily found through the header table.
The primitive number is its token . Like any token, the primitive takes two bytes in the sewn code. The first is zero. The second contains his number. The tblTokens table allows you to quickly find the address of the primitive code by this number. The tblTokens address is permanently stored in R9042 (see 2.2), that is, everything is always at hand to execute the primitive.
The word XT> allows you to find out the address of a primitive code by its number (token). Since the code of primitives is always located in the program memory, the received address is always negative (see 2.4.3).
VCA can have its own number and associated standard name, or it can be completely new, created by the user. In all cases, the VCA token is the address of its code field (see 3), that is, a number from 1000 to 5095.
In the sewn code, the VCA token is written in a very unusual way. The number of hundreds (a number from 10 to 50) is written in the first byte, the remainder from dividing the token by 100 (a number from 0 to 99) in the second byte.
For example, token 1234 will be represented by two bytes 12 and 34. Compilation of this, and any other token, is carried out using the word COMPILE taken from the ANSI standard. To write and read VCA tokens in the sewn code, the words XT! and XT @. They access addresses (see 3.1.4), and the word XT @ is also able to read the primitive token.
Entire literals are a kind of primitive tokens. They are unusual enough to be considered separately.
In the sewn code, the DOLIT and DOLITM tokens occupy four bytes. The first two bytes contain the primitive token already considered, that is, 0 and the number of the primitive. The next two bytes contain an integer that the given literal will put on the data stack during execution.
DOLITM differs in that it changes the sign of the number before putting it on the stack. It is designed to implement negative numbers.
Like whole literals, the three address literals BRANCH,? BRANCH, and DONXT occupy 4 bytes each in the sewn code. The first 2 bytes contain the primitive token, the last two bytes are the jump address.
The address is recorded in the same format as the VCA token (see 3.1.2). The first byte contains the number of hundreds, the second contains the remainder of dividing the address by 100. I recall that due to exaggeration (see 2.1), the transition address does not contain the address of the desired token, but a number less by one.
The DONXT token helps implement the FOR-NEXT “end cycle” (see 1). The BRANCH unconditional jump is needed to implement the infinite BEGIN-AGAIN loop. Conditional branch? BRANCH transfers control if zero is on the top of the data stack (false). It serves to implement the conditional IF-THEN statement, exits from "indefinite loops" BEGIN-UNTIL and BEGIN-WHILE-REPEAT.
String literals are a type of VCA tokens. In the sewn code of a string literal, after a token, there is a byte with a string length, after which is the string itself, from the first byte to the last.
EForth has three string literals: $ "|,." | and abort "|. They are defined in the eForth0.mkl file as STRQP, DOTQP, and ABORQ tokens, respectively. The main" literal "work is done for them by the word do $, the DOSTR token.
To make the article size reasonable, I cannot dwell too much on this interesting topic, but it's nice to know about their availability in eForth.
It's time to consider the token interpreter , whose address is always written in register 9. Most primitives finish their work with the command K BP 9, which transfers control to the INEXT label.
First, the address interpreter reads the first byte of the next token with the KIP6 command. If it is zero, this is a primitive and the code under the label NPrime will handle the token.
The label NData denotes the processing of the VCA token. The first byte is multiplied by one hundred by the VP 2 command, after which KIP6 + adds the second byte of the token to the result (see 3.1.2). The read token is entered by the P7 team into the “working register” WP (R7).
We know that the VCA token is the address of its code field, which contains the address of the processor. The KIP7 P8 commands read the byte of the code field in R8, and the KBP8 command transfers control to the VCA processor. The handler knows that R7 contains a number one less than the address of the parameter field of the word being processed.
Commands F⟳ with code 25 are “tidied up” on the stack. The fact is that eForth stores the top two elements of the data stack directly in the X and Y registers of the MK-161 stack. Such a solution speeds up the work, but makes it necessary to ensure that these important data are not lost.
It remains to understand how the address interpreter executes the primitives.
The KIP6 command reads the second byte of the primitive token. RRP9210 P8 commands read the address of this primitive from the tblTokens table (see 2.2 and 3.1.1), and KBP8 transfers control to this primitive.
As above, F⟳ remove excess from the stack, restoring the contents of the X and Y registers. The
eForth address interpreter is so tiny that it is duplicated several times in the program memory. The main copy is executed by the command K BP 9, which completes most of the primitives.
As an exercise, I recommend studying the implementation of the word EXECUTE, placed after the EXECU label. This is an INEXT variant, which reads the token not from the sewn code, but takes it from the data stack.
Four varieties of VCA have four different handlers: DOLST, DOVAR, DOCON, and DOCONM. We have already seen above that the address interpreter before calling the handler leaves in R7 the address of the code field of the word being processed.
eForth.f learns the addresses of these handlers by reading the kernel header from the eForth0.mkp file. This helps him to compile the VCA for the Electronics MK-161 correctly by placing the result in the eForth.mkb file.
The next important topic after INEXT is what the internal interpreter does when it encounters the token of a word defined through colons. The code field of such a word contains the number 2, so INEXT transfers control to the DOLST handler, which does the necessary work to start interpreting the new list of tokens.
Register 2, as we have already discussed (see 2.1), contains an RP return stack pointer. The IP6 KP2 commands write the value of R6, the Interpretation Pointer (IP), to the return stack. Later this will help to remember the current position in the old list of tokens, where INEXT came across a colon word. Now IP7 P6 rearranges IP to the beginning of a new list.
Immediately after the DOLST code, the INEXT code is placed, which will execute the first word of the new token list. As elsewhere, the F команды commands help maintain the top two elements of the data stack.
Colon words usually end with an EXITT token, which does the opposite, compared to DOLST - it takes the old IP value from the return stack and returns to the interpretation of the old token list.
Commands RKIP02 P6 read the old IP value from the top of the return stack (see 2.1). After that, the Cx 1 IP2 + P2 commands correct the value of RP, increasing it by one. The F⟳ command restores the stack, after which INEXT executes the next word from the old token list.
Of course, INEXT cannot go both after DOLST and after EXITT at the same time. To do this, I applied one ancient trick from the times of the USSR. You can also master it by examining the corresponding lines in the eForth0.mkl file.
Words generated by the words CREATE and VARIABLE use the same DOVAR handler. This handler pushes on the stack the address of the variable located in the parameter field, which goes immediately after the byte of the code field. VARIABLE variables occupy 2 bytes, and the arrays created using CREATE contain as many bytes as the programmer wants.
Commands ⇔ KP3 save the contents of register Y in the data stack. At the same time, the number from the top of the stack is entered into RY, freeing RX to the new value. After Cx 1 IP7 + commands, this new value at the top of the stack becomes the address of the parameter field of the executable word. KBP9 transfers control to INEXT, without any tricks, moving on to the next word.
Unlike DOVAR, the constant handler accesses the parameter field of its word itself. DOCON reads a 16-bit constant value from it. This value is always positive.
Commands ⇔ KP3 ⇔ save RY in the data stack. But this time, the old top of the data stack returns to RX. The IP7 P5 commands force it back into RY, while preparing the pointer register R5 to read the value of the constant. Next, Cx 256 replaces the garbage in register X with the number 256.
Instruments KIP5 × KIP5 + read the constant from the parameter field to the top of the data stack, that is, in RX. As we recall, in MK-161 the first byte is always high. It is multiplied by 256, after which the least significant byte of the constant is added to the product. All work is done, KBP9 transfers control to the next word.
DOCONM works in exactly the same way, only the constant sign after reading changes to the opposite. Negative constants are implemented on the MK-161 as a separate processor for the sake of speed:
Now we have completely figured out how eForth executes its code on the MK-161 Electronics from the data area, even touching on a deeper topic of string literals (see 3.1.5).
In the second article of the series, I will talk about the external “text” interpreter 161eForth, analyze the structure of the header tables and name recognition. This part of the translator required me to develop much more radical solutions, against the background of which the above discussed is the traditional Fort, old and good.
Happy Fort programming!
These four small 161eForth videos are continued. The first video at the beginning of the article.
Part 2 of 5. Tests TEST-TEST4 from the book "eForth and Zen", 3rd edition, on the MK-161.
Part 3 of 5. SEE decompiler.
Part 4 of 5. Breakpoint BYE, RS-232 terminal and remote access to MK-161.
Part 5 of 5. Concluding words.
The EFORTH translator is now also available on the domestic MK-161 Electronics calculator! On May 17, version v0.5b successfully passed my tests, as well as five TEST-TEST4 authoring tests. I have achieved what can be done alone, but I think this is only half the battle. It is time to introduce a new tool to the community by opening the 161eForth code for public testing. I have a list of what to improve and where to "work on stability." Your suggestions and comments will be taken into account when completing work and release version 1.0
When porting the latest version of eForth to the domestic platform, two obstacles were successfully overcome - the relatively low speed of the 8-bit machine, which is programmed in its own input language, and the modest amount of available binary memory (see 2.4.1), only 4096 bytes.
When writing 161eForth, ready-made solutions prepared for Callisto, the next-generation input language for domestic PMK, were used. This is a technology for implementing a fort machine on top of the decimal ALU and the “Harvard” architecture, console drivers and an alphanumeric keyboard layout, as well as a software terminal based on them, operating via the RS-232 serial port. In addition to Electronics MK-161 and the 161eForth distribution, you may need a home-made patch keyboard where the letters of the Russian and English alphabets are signed on the keys. The letters are arranged alphabetically line by line, from left to right and from top to bottom.
Dr. Chen-Hanson Ting, author of modern versions of eForth, emphasizes in his book [1] the importance of understanding the two components of the Fort. This is an internal ("address") interpreter that allows the equipment to execute Fort's sewn code, and an external ("text") interpreter responsible for dialogue with a person.
In two articles, I will dwell in detail on the most radical solutions used in the implementation of each of these two interpreters at Electronics. Learning these solutions can be useful and inspiring to migrate eForth to other devices with limited memory and performance. Understanding the articles will help with an initial introduction to programmable microcalculators (PMCs) and Fort. I will explain difficult moments unique to Electronics MK and the eForth translator.
To begin with, the words eForth are divided into general and systemic. The size of the letters matters. The names of ordinary words are defined in uppercase letters, and system - lowercase. I also made my innovations in eForth in lower case. The author of eForth suggests conducting the main dialogue in CAPS mode. When you need to use the system word, switch the time to lowercase letters (FP key combination).
In the article, all words are written in capital letters to stand out from the text. In several early eForth implementations, system word headers were excluded and not output by the WORDS command. This helped simplify the appearance of eForth and save the attention of those who use Fort for the first time. In 161eForth, the headings of these words were saved primarily due to the presence of the SEE colon word decompiler (see video No. 3 at the end of the article), which will not show the names of system words if their headings are removed.
To streamline the article and make it useful as a reference, I had to use several terms before defining them. Fort and PMK professionals should be familiar with these terms. Beginners sometimes have to look in the neighboring sections (I put the links in the right places) or re-read the article a couple of times.
161eForth itself is posted here, along with the source text, a graphic on-screen keyboard and help words.txt with a description of all implemented words: http://the-hacker.ru/2019/161eforth0.5b.zip
I also posted 5 small videos on YouTube, illustrating the operation of the 161eForth for those who do not have the MK-161. You can watch the entire playlist on YouTube . Below is the first of them, the remaining 4 at the end of the article.
eForth and its implementation
eForth was designed as a modern replacement for the widely-known fig-Fort translator. To transfer to the MK-161, I chose a 32-bit version 5.2 of the 86eForth translator with indirect sewn code, written in 2016 on MASM assembler for the Windows operating system. This version is described in detail in the third edition of eForth and Zen [1]. Those who know English, I advise you to find and study this book, it is very useful for understanding 161eForth.
In a personal letter, the author confirmed that 86eForth502.asm from this book is the latest version of eForth. On the Internet you can find a lot of English-language information on this and on previous versions of eForth.
The development of eForth followed a scientific path taught by Professor Wirth using the example of his programming language Oberon. Each subsequent version of eForth was a simplification of the previous version. Everything that can be dispensed with was removed from the tongue. There remains a carefully thought-out set of strong, expressive language constructs, whose power has been tested on more than 40 eForth implementations for various platforms. Now on the calculator!
Being a minimalist dialect of Fort, eForth does not aim to win the race against the smallest Fort. The set of words he offers is quite practical and can easily be expanded by the programmer in the direction necessary for his tasks.
The first version of eForth was released in 1990 in MASM assembler for 8086 processors and worked under MS-DOS. It contained 31 machine-dependent core words and 191 high-level words. The idea was simple - you translate only 31 words into your assembler, and immediately get eForth on your computer.
This approach has been criticized on the Internet, as the way to minimize the number of words in assembler has led to extremely low performance for embedded systems. Already in the second version of eForth, the maximum number of words began to be implemented in assembler, which straightened the tilt towards not only an easily portable, but also a practical programming system.
For several years, Bill Munch, the original author of eForth, and his colleague Dr. Chen-Hanson Ting released their eForth releases in parallel. Each version had its own characteristics. The eForth options for different platforms have also been put in by other programmers.
Version 5.2, released in 2016, contains 71 words of “code” and 110 words of “colon”. A quarter century of searching for the ideal has led to a significant reduction in the total number of words. At the same time, for performance reasons, the percentage of words implemented at a low level increased.
The proposed 161eForth enjoys the generous benefits of this progress, but does not pretend to further develop the trunk line. My implementation provides the programmer with all the tools present in version 5.2. When the MK-161 architecture makes the implementation of some 86eForth words impossible or meaningless, instead of throwing away the excess, I give the programmers a complete replacement, taking it from the ANSI / ISO standard [4]. Those seeking minimalism can independently throw out “extra” words, because by tradition 161eForth comes with source code.
When implementing eForth, I adhered to the author's understanding. For example, in my opinion, a FOR NEXT loop with an initial value of n should execute exactly n times. The same conclusion eventually came Chuck Moore, the author of the languages Forth and colorForth. Unfortunately, eForth uses an outdated convention and executes such a cycle n + 1 times, with a counter from n to 0. I did not fix this and several other shortcomings, preferring 161eForth compatibility with implementations for other platforms.
Since 161eForth is the first practical on-board programming system for the MK-161 Electronics, with the exception of the factory language, I traced the long history of eForth and returned a few words to the language that were useful on other platforms and may be in demand now.
For example, the new-old variable 'BOOT contains the token (see 3.1) of the word, which is executed first after the environment is initialized, but before the dialogue starts. By default, 'BOOT contains a TLOAD token for interpreting code from the “text area” (see 2.4.2). This allows the programmer to customize eForth for himself without recompiling the environment, which is still impossible to produce on board the "Electronics".
The priority tasks of the implementation were saving binary memory (see 2.4.1) and improving performance. Their solution led to a dramatic decrease in the number of high-level words, because their code occupies this precious memory, due to an increase in the number of fast core words implemented in cheap program memory (see 2.4.3).
As a result, 161eForth contains 129 code words, 78 high-level words and occupies 1816 bytes of MK-161 binary memory, that is, less than half of it. This gives hope for the metacompilation of its high-level part directly on board the Electronics.
The source code for eForth MK-161 is divided into two large parts. The core written in the MK-161 command system is contained in the eForth0.mkl file. High-level words are defined in SP-Forth and placed in the eForth.f file.
The distribution also has a help file words.txt, which documents all 161eForth words with stack notation and a brief explanation, in one line.
1.1 The source code of the kernel eForth0.mkl
The eForth core contains executable code operating in the memory of MK-161 programs (see 2.4.3), which is compiled on a computer into the eForth0.mkp file by standard means, for example, the proprietary MKL2MKP compiler.
The kernel source code contained in the eForth0.mkl file is written in Latin mnemonics . For example, an IPE command to read register E (aka R14) is written in this mnemonics as RME. Being unusual for the owners of Soviet PMK, Latin mnemonics are convenient for typing from a computer keyboard. Indeed, it’s easier to type weird FX ^ 2 than familiar from childhood Fx².
The eForth0.mkp file is a kernel preset. In addition to the code of primitives, it contains a kernel header and a table of names tblNames, which eForth.f transfers during decoding to decimal registers (see 2.4.4). It is on the basis of eForth0.mkp that the eForth.mkp core will be created (see 2.4.3), so eForth0.mkl must be compiled first.
1.2 Source code for high-level words eForth.f
The eForth.f file is fed to the input of the wonderful domestic compiler SP-Forth [5]. The file contains definitions of all high-level words. Over time, they can be identified on the eForth itself and possibly compiled directly on board the Electronics MK-161.
During compilation, eForth.f reads the core blank eForth0.mkp and with its help creates three files in the current directory for subsequent loading into MK-161: eForth.mkp, eForth.mkd and eForth.mkb. It is eForth.mkb that contains the bodies of high-level words, although their headers are located in the eForth.mkd file.
The fourth file, eForth.mkt, is manually written in eForth and can be edited onboard the MK-161 using the built-in text editor. Each of these four files I will analyze in more detail below (see 2.4).
2. Electronics MK-161
A manufacturer from Novosibirsk calls the MK-161 an old acronym. That was the name of the very first calculators in the USSR. The MK-161 instruction system inherits the command system of the Soviet calculators "Electronics B3-34" and "Electronics MK-61." This means that programs written for Soviet calculators will go on the MK-161 without changes or with minor changes.
The converse is not true. eForth will not go to the Soviet PMK, because uses a lot of resources that first appeared in the MK-152/161 and were not available in previous models of the series.
Consider the features of the input language and architecture of the MK-161, which influenced 161eForth (hereinafter referred to simply as eForth) and gave the discussed implementation of eForth “Russian accent”.
The first of these features is sequentially maintained in MK-161senior-to-junior agreement . For example, the number 1000 = 3 × 256 + 232 will be written in two consecutive bytes, like 3 and 232.
2.1 Indirect Addressing
Programming Soviet PMK heard about indirect addressing. For direct addressing, we explicitly indicate the register number we are referring to. For example, Р ИП 44 considers the contents of register 44. The Р key, which appeared in MK-152, is used to access registers with number 15 and more — these registers were absent in the Soviet PMK.
In indirect addressing, the number of the required register is not known in advance. This number is in a different register. For example, if register 8 contains the number 44, the command K PI 8 considers the contents of register 44 (R44).
The K and P keys can be combined. For example, the command R K BP 20 will transfer control (GOTO in Latin mnemonics) to the address stored in R20.
The feature that turned out to be important for the internal eForth interpreter is related to preliminary increase / decrease of registers during indirect addressing. This feature is inherited from the Soviet PMK.
For example, the indirect reading commands KI 0, KI 1, KI 2 and KI 3 reduce the contents of the registers 0, 1, 2 or 3 by one to the desired register. The commands KI 4, KI 5 and KI 6 before reading, the contents of registers 4, 5 or 6 are increased by one.
Such a “modification” of the address register allows you to process entire groups of registers in a cycle. It is similar to ++ R and --R in C. The register register number is important. It is he who determines whether it will increase (registers 4-6) or decrease (registers 0-3) with indirect addressing.
The 161eForth architecture was affected by the fact that the increase in registers 4-6 with indirect addressing is preliminary . As a result, the interpretation pointer (IP) located in R6 always points to the last byte of the sewn code. In 86eForth, IP always indicates a subsequent byte that has not yet been read.
This is also true for the return stack pointer (RP) stored in register 2. R2 always points to the top of the return stack.
A useful feature of the MK-161 is the absence of an increase / decrease in the register if indirect addressing occurs with the new key R. For example, RKIP02 counts the number from the top of the return stack without changing the pointer. This is a ready-made Fort R @ team. From the above it follows that the read value is one less than the address of the next token, which will be executed after returning from the word "colon".
When you have to develop or study words that closely interact with the eForth internal interpreter, be sure to fully understand this subtle point associated with exaggeration .
2.2 Tables, ordered and associative
The MK-161 tables are located in the program memory (see 2.4.3). They appeared in the Novosibirsk "Electronics MK" and are completely unfamiliar to experts on Soviet PMK. The address of the table used is always stored in register 9042, but the access to them is different.
An ordered table is an array of unsigned 16-bit integers. eForth contains such a tblTokens table with the addresses of primitives (see 3.1.1) - Fort words written in the MK-161 command system. The address interpreter (see 3.2) uses tblTokens to quickly execute sewn code, so eForth tries to always contain the address of this table in R9042.
To access an ordered table, you need to write the number of the desired item in R9210. The number n in the register X will be replaced by the value of the table element with the number n, the count starts from zero.
Associative tables ("search by value") are actively used by eForth, primarily by the primitive (FIND), looking for a word by its name. Also, the tblCHPUT associative table is used when printing letters to the screen to process line feeds and other control codes.
To search for the element n in the associative table, write n to R9212. The number n in the register X (the management calls it the “index”) will be replaced by the 16-bit value recorded in the table immediately after its “index” n.
The presence of this quick, albeit simple, search function implemented in assembly language in the MK-161 “firmware” helped eForth achieve acceptable performance when recognizing word names and compiling programs. Of course, for this I had to develop not the simplest name recognition tables, “sharpened” for this function. We will talk about this in more detail in the second article.
2.3 Interrupts and Console
“MK Electronics” allows its owners to write programs in the input language that respond to certain events - such as pressing or releasing a button, ending a timer count.
eForth actively uses this interrupt system for both keyboard input and a blinking cursor when prompted for such input, and for input / output via a universal serial port (RS-232).
The letters entered from the keyboard are queued bufKbd as you press the keys. This is very convenient and saves time on systems with low speed. Alphabet and case switching are handled by the KeyPress interrupt and do not take up queue space. A long press on the key calls auto-repeat.
When the line of 8 letters is full, and eForth is not yet ready to process the input (the situation is very rare), the MK-161 will issue an unhappy squeak. Of course, I would not like to implement all this natural work of the keyboard in the translator, but to get the MK-161 out of the box as a service of the built-in program (firmware). But what, as they say, is rich.
After the start of work, all eForth output is directed to the MK-161 graphic screen . The output of letters on it is carried out by a relatively simple routine of the CNCut. The only difficulty here is the implementation of the BS control code, the “space back”. MK-161 uses a proportional font. Therefore, in a special buffer tblBS you have to remember the positions of the displayed characters, from where the BS output code later takes them.
During the dialogue, the user can use the word IO> to redirect all the I / O to the RS-232 serial port, which makes it possible to program the MK-161 from a familiar computer keyboard or from another MK-161 . The word CON> returns control to the calculator console.
2.4 Memory Areas and Installing eForth on the MK-161
The “MK-161 Electronics” memory consists of separately addressable program memory and data register memory. In turn, the register memory is heterogeneous and is divided into three large areas.
Registers with numbers from 0 to 999 store "decimal numbers". These are ordinary registers, as in "Electronics B3-34" and other calculators. They are simply capable of storing not 8, but 12 decimal places of the “mantissa”.
Registers with numbers from 1000 to 8167 store integers from 0 to 255. The last 3 Kbytes of this area with addresses from 5096 to 8167 are called the text area .
Registers with numbers from 9000 to 9999 are called function registers. This service area of the address space resembles microprocessor I / O ports. With the help of write and read commands, these addresses are used to access I / O devices, interrupt systems, etc.
To install eForth on Electronics MK-161, it is enough to transfer four files to the calculator, for example, using the program of the manufacturer MK.EXE:
- Write eForth.mkp to the program memory starting from page 0. Version 0.5b occupies 74 pages.
- Write eForth.mkd to decimal data memory
- Write eForth.mkb to binary data memory
- Write eForth.mkt to text memory
After transferring to the calculator, I recommend immediately saving these four files in a separate directory of the built-in “electronic disk”. Since they have the same name, you can download eForth right away at a time as a “package”.
2.4.1 Binary ("byte") memory MK-161: eForth.mkb
The Electronics MK registers with numbers from 1000 to 5095 are used to store numbers from 0 to 255. This area of the register memory of the calculator is called binary. Two consecutive binary registers can be accessed from eForth as a single 16-bit “cell”, and (as everywhere on the MK-161), the upper 8 bits are in the register with a lower number.
eForth uses this tiny “binary memory” as its primary. Words work with her! and @, HERE and ALLOT, only from here the address interpreter executes the sewn code (see 3.2). Here are the eForth variables, the text input buffer (TIB), the dictionary, and the tblBS rollback stack to implement the backspace.
4096 bytes is very modest, by modern standards. Therefore, enormous efforts have been expended to bring to other areas of memory everything that is possible.
2.4.2 Text area: eForth.mkt
Immediately after the binary memory is a text area , registers with numbers from 5095 to 8167. Technically, these are the same byte registers, but the ability to write them to disk and read as a separate file makes this area special.
The word TLOAD is used to work with “text” in eForth. It feeds this entire area to the input of the text interpreter, as a string, 3072 letters long.
There is disagreement on how to break text into lines. An editor built into MK Electronics insists on a line length of 24 characters. Callisto uses the Fort convention, where the string contains 64 characters. eForth provides the user with the choice of counting all text as one long line. You can use the built-in editor MK-161. You can write your own, compatible with Callisto.
Here is the initial content of eForth.mkt, for convenience, divided into three lines:
: hi ." Привет, %user%!" CR ;
‘ hi ‘boot !
hi \
The first line defines the new word hi that greets the user. The second line takes the token of this word (see 3.1) and places it in the variable 'BOOT (see 1). Now the text area will stop compiling every time eForth starts. Instead, the already compiled greeting will be executed.
The last line starts the word hi, displaying a greeting on the screen. The word \ completes the interpretation of the text, returning control to the console.
To compile an arbitrary text file, you need to go to the calculator with the BYE command, go to the main menu and load the desired file in DOS mode. You can also transfer the mkt file from a computer. The C / P key will return you to eForth, after which with the TLOAD command you can compile the file loaded into the text area.
2.4.3 Program memory: eForth.mkp
Program memory MK-161 is an isolated address space. It also stores bytes, but they are read-only. Program memory contains 10,000 "steps", which turned out to be redundant for eForth. More than a quarter of the program memory turned out to be free, which gives a good reserve for the development of the translator.
Only in program memory can "code words" be implemented. Also, name recognition tables and all known text strings are rendered here, which saves binary memory.
Some words, such as C @, COUNT, and TYPE, can address program memory if the address is not a positive number. For example, the phrase 0 C @ counts as a “step” (byte) from address 0 of program memory.
2.4.4 Decimal memory: eForth.mkd
The registers of MK Electronics with numbers from 0 to 999 are called decimal and contain numbers used for ordinary calculations on the calculator - 12 decimal digits of the “mantissa” and 2 decimal digits of the “order”. The fort is designed to work with integers up to 4 bytes long, such a resource is clearly redundant for eForth.
Decimal memory is used to save precious binary memory. Stacks of data and returns are made here. The headings of words are stored here - both user-defined and embedded, one register per title. This approach allows you to redefine even words with standard names.
Stack in decimal memory leads to a number of features characteristic of the Fort on the MK-161. Firstly, the range of values of the stack elements is huge; it can accommodate 32-bit integers. The need for "double integers" on the MK-161 disappears, although for the sake of compatibility I have implemented the corresponding words eForth. “Double integers” are presented on MK-161, as two stack elements containing numbers from 0 to 65535, encoding one 32-bit integer with a sign in the additional code. The high 16 bits of this number are placed on top, that is, at the lowest address.
The bitwise logical operations AND, OR, XOR, and NOT treat their arguments as 16-bit integers. A result from 32768 to 65535 is converted to negative numbers from -32768 to -1. In eForth, false is encoded with zero and truth minus one. Also true is any value other than zero.
The second feature of the 161eForth data stack is that it contains signed numbers. When the word @ reads the number 65535 from a 16-bit “cell”, it is automatically converted to -1. A special “unsigned” word U @ is provided in order to count directly 65535, with a plus sign.
I recall that for the sake of speed, the two upper elements of the data stack are not located in decimal memory, but directly in the X and Y registers .
The fact that decimal registers can contain fractional numbers and floating point numbers is not used by eForth. The eForth virtual machine uses these registers to store signed 12-bit decimal integers. Decimal registers are accessed by the words C @ and C! - the same ones that work with any single registers.
3. The internal interpreter
The eForth core is a program written in the MK-161 input language. Her first MAIN command transfers control to the MAIN code, which first of all finds out the circumstances of the reboot. If it was caused by the wrong token, MK-161 will squeak. At the first start-up, and also after turning on the MK-161, the screen is cleared. Next, MAIN calls the Init subroutine to initialize the interrupt system and everything that the MK-161 console drivers need.
After initializing the data stacks and returns, the low-level part of the start is complete. Incredible things happen for machines with Harvard architecture - eForth goes on to execute “wired code” from byte memory. The honor of being the first belongs to a word whose header address is recorded in R43. This is usually the word COLD.
How are high-level words arranged ?(VCA)? Any word consists of two parts, a body and a heading. The header is stored in decimal. It helps the external interpreter and decompiler find the name and body of the word. The heading also contains a “lexicon” field - a set of flags that help the external interpreter correctly process the word found. The internal interpreter is much more important to the VCA body located in binary memory and stored in the dictionary. He is able to even execute words that have no heading.
The body of the VCA starts with the byte of the code field that contains the address of the handlergiven word. Four VCA handlers are written in the MK-161 input language and begin on the first page of program memory. We will analyze them all (see 3.3), but the main one is called DOLST and is located at address 02, immediately after the command MAIN BP already considered. This handler executes Fort words defined with a colon.
After the byte of the code field is a parameter field of arbitrary length. In the “colon words”, the parameter field contains a “sewn code” - a sequence of 16-bit tokens, each of which indicates one action assigned to it.
First, we will consider the token in more detail. Then we will study the INEXT internal interpreter, which transfers from one token to the execution of the next. EForth calls INEXT a primitive handler. We conclude this tour of the internal interpreter by analyzing all four IED processors.
3.1 Tokens
The token represents the word in the sewn code and stack, allowing it to be executed quickly. The token is a pointer to the body of the word, but the harsh architecture of the MK-161 made its own adjustments to this simple idea. Let's analyze all types of tokens, starting with the primitive token.
3.1.1 Primitive Token
All words included in the eForth distribution are numbered from 0 to 206. This numbering is end-to-end, taking into account both primitives and VCA. This is done so that by the number of the word it was easy to restore his name . These names are stored in program memory. The link to the desired name is easily found through the header table.
The primitive number is its token . Like any token, the primitive takes two bytes in the sewn code. The first is zero. The second contains his number. The tblTokens table allows you to quickly find the address of the primitive code by this number. The tblTokens address is permanently stored in R9042 (see 2.2), that is, everything is always at hand to execute the primitive.
The word XT> allows you to find out the address of a primitive code by its number (token). Since the code of primitives is always located in the program memory, the received address is always negative (see 2.4.3).
3.1.2 VCA token
VCA can have its own number and associated standard name, or it can be completely new, created by the user. In all cases, the VCA token is the address of its code field (see 3), that is, a number from 1000 to 5095.
In the sewn code, the VCA token is written in a very unusual way. The number of hundreds (a number from 10 to 50) is written in the first byte, the remainder from dividing the token by 100 (a number from 0 to 99) in the second byte.
For example, token 1234 will be represented by two bytes 12 and 34. Compilation of this, and any other token, is carried out using the word COMPILE taken from the ANSI standard. To write and read VCA tokens in the sewn code, the words XT! and XT @. They access addresses (see 3.1.4), and the word XT @ is also able to read the primitive token.
3.1.3 Integer literals
Entire literals are a kind of primitive tokens. They are unusual enough to be considered separately.
In the sewn code, the DOLIT and DOLITM tokens occupy four bytes. The first two bytes contain the primitive token already considered, that is, 0 and the number of the primitive. The next two bytes contain an integer that the given literal will put on the data stack during execution.
DOLITM differs in that it changes the sign of the number before putting it on the stack. It is designed to implement negative numbers.
3.1.4 Address Literals
Like whole literals, the three address literals BRANCH,? BRANCH, and DONXT occupy 4 bytes each in the sewn code. The first 2 bytes contain the primitive token, the last two bytes are the jump address.
The address is recorded in the same format as the VCA token (see 3.1.2). The first byte contains the number of hundreds, the second contains the remainder of dividing the address by 100. I recall that due to exaggeration (see 2.1), the transition address does not contain the address of the desired token, but a number less by one.
The DONXT token helps implement the FOR-NEXT “end cycle” (see 1). The BRANCH unconditional jump is needed to implement the infinite BEGIN-AGAIN loop. Conditional branch? BRANCH transfers control if zero is on the top of the data stack (false). It serves to implement the conditional IF-THEN statement, exits from "indefinite loops" BEGIN-UNTIL and BEGIN-WHILE-REPEAT.
3.1.5 String literals
String literals are a type of VCA tokens. In the sewn code of a string literal, after a token, there is a byte with a string length, after which is the string itself, from the first byte to the last.
EForth has three string literals: $ "|,." | and abort "|. They are defined in the eForth0.mkl file as STRQP, DOTQP, and ABORQ tokens, respectively. The main" literal "work is done for them by the word do $, the DOSTR token.
To make the article size reasonable, I cannot dwell too much on this interesting topic, but it's nice to know about their availability in eForth.
3.2 Address interpreter
It's time to consider the token interpreter , whose address is always written in register 9. Most primitives finish their work with the command K BP 9, which transfers control to the INEXT label.
INEXT: КИП6 Fx≠0 NPrime
NData: ВП 2 КИП6 + П7 F⟳
КИП7 П8 F⟳ КБП8
First, the address interpreter reads the first byte of the next token with the KIP6 command. If it is zero, this is a primitive and the code under the label NPrime will handle the token.
The label NData denotes the processing of the VCA token. The first byte is multiplied by one hundred by the VP 2 command, after which KIP6 + adds the second byte of the token to the result (see 3.1.2). The read token is entered by the P7 team into the “working register” WP (R7).
We know that the VCA token is the address of its code field, which contains the address of the processor. The KIP7 P8 commands read the byte of the code field in R8, and the KBP8 command transfers control to the VCA processor. The handler knows that R7 contains a number one less than the address of the parameter field of the word being processed.
Commands F⟳ with code 25 are “tidied up” on the stack. The fact is that eForth stores the top two elements of the data stack directly in the X and Y registers of the MK-161 stack. Such a solution speeds up the work, but makes it necessary to ensure that these important data are not lost.
It remains to understand how the address interpreter executes the primitives.
NPrime: F⟳ КИП6 РРП9210 П8 F⟳ КБП8
The KIP6 command reads the second byte of the primitive token. RRP9210 P8 commands read the address of this primitive from the tblTokens table (see 2.2 and 3.1.1), and KBP8 transfers control to this primitive.
As above, F⟳ remove excess from the stack, restoring the contents of the X and Y registers. The
eForth address interpreter is so tiny that it is duplicated several times in the program memory. The main copy is executed by the command K BP 9, which completes most of the primitives.
As an exercise, I recommend studying the implementation of the word EXECUTE, placed after the EXECU label. This is an INEXT variant, which reads the token not from the sewn code, but takes it from the data stack.
3.3 VCA handlers
Four varieties of VCA have four different handlers: DOLST, DOVAR, DOCON, and DOCONM. We have already seen above that the address interpreter before calling the handler leaves in R7 the address of the code field of the word being processed.
eForth.f learns the addresses of these handlers by reading the kernel header from the eForth0.mkp file. This helps him to compile the VCA for the Electronics MK-161 correctly by placing the result in the eForth.mkb file.
3.3.1 Colon Words: DOLST and EXIT
The next important topic after INEXT is what the internal interpreter does when it encounters the token of a word defined through colons. The code field of such a word contains the number 2, so INEXT transfers control to the DOLST handler, which does the necessary work to start interpreting the new list of tokens.
DOLST: ИП6 КП2 F⟳
ИП7 П6 F⟳
INEXT:
Register 2, as we have already discussed (see 2.1), contains an RP return stack pointer. The IP6 KP2 commands write the value of R6, the Interpretation Pointer (IP), to the return stack. Later this will help to remember the current position in the old list of tokens, where INEXT came across a colon word. Now IP7 P6 rearranges IP to the beginning of a new list.
Immediately after the DOLST code, the INEXT code is placed, which will execute the first word of the new token list. As elsewhere, the F команды commands help maintain the top two elements of the data stack.
Colon words usually end with an EXITT token, which does the opposite, compared to DOLST - it takes the old IP value from the return stack and returns to the interpretation of the old token list.
EXITT: РКИП02 П6 Сx 1 ИП2 + П2 F⟳
INEXT:
Commands RKIP02 P6 read the old IP value from the top of the return stack (see 2.1). After that, the Cx 1 IP2 + P2 commands correct the value of RP, increasing it by one. The F⟳ command restores the stack, after which INEXT executes the next word from the old token list.
Of course, INEXT cannot go both after DOLST and after EXITT at the same time. To do this, I applied one ancient trick from the times of the USSR. You can also master it by examining the corresponding lines in the eForth0.mkl file.
3.3.2 DOVAR, variable and array handler
Words generated by the words CREATE and VARIABLE use the same DOVAR handler. This handler pushes on the stack the address of the variable located in the parameter field, which goes immediately after the byte of the code field. VARIABLE variables occupy 2 bytes, and the arrays created using CREATE contain as many bytes as the programmer wants.
DOVAR: ⇔ КП3 Сx 1 ИП7 + КБП9
Commands ⇔ KP3 save the contents of register Y in the data stack. At the same time, the number from the top of the stack is entered into RY, freeing RX to the new value. After Cx 1 IP7 + commands, this new value at the top of the stack becomes the address of the parameter field of the executable word. KBP9 transfers control to INEXT, without any tricks, moving on to the next word.
3.3.3 Constant Handlers: DOCON and DOCONM
Unlike DOVAR, the constant handler accesses the parameter field of its word itself. DOCON reads a 16-bit constant value from it. This value is always positive.
DOCON: ⇔ КП3 ⇔
ИП7 П5 Сx 256
КИП5 × КИП5 + КБП9
Commands ⇔ KP3 ⇔ save RY in the data stack. But this time, the old top of the data stack returns to RX. The IP7 P5 commands force it back into RY, while preparing the pointer register R5 to read the value of the constant. Next, Cx 256 replaces the garbage in register X with the number 256.
Instruments KIP5 × KIP5 + read the constant from the parameter field to the top of the data stack, that is, in RX. As we recall, in MK-161 the first byte is always high. It is multiplied by 256, after which the least significant byte of the constant is added to the product. All work is done, KBP9 transfers control to the next word.
DOCONM works in exactly the same way, only the constant sign after reading changes to the opposite. Negative constants are implemented on the MK-161 as a separate processor for the sake of speed:
DOCONM: ⇔ КП3 ⇔
ИП7 П5 Сx 256
КИП5 × КИП5 + /-/ КБП9
Now we have completely figured out how eForth executes its code on the MK-161 Electronics from the data area, even touching on a deeper topic of string literals (see 3.1.5).
In the second article of the series, I will talk about the external “text” interpreter 161eForth, analyze the structure of the header tables and name recognition. This part of the translator required me to develop much more radical solutions, against the background of which the above discussed is the traditional Fort, old and good.
Happy Fort programming!
Literature
- Dr. Chen-Hanson Ting. eForth and Zen - 3rd Edition, 2017. Available on Amazon Kindle.
- Baranov S.N., Nozdrunov N.R. Fort language and its implementation. - L .: Mechanical engineering. Leningrad Department, 1988.
- Semenov Yu.A. Programming in the FORT language. - M .: Radio and communications, 1991.
- ANS Forth standard. X3.215-1994. Translation .
- SP-Forth Documentation .
- Offete Store (Proceedings of Dr. Chen-Hanson Ting) , where you can download 86eForth v5.2 for Windows, documentation in English.
Video illustrations
These four small 161eForth videos are continued. The first video at the beginning of the article.
Part 2 of 5. Tests TEST-TEST4 from the book "eForth and Zen", 3rd edition, on the MK-161.
Part 3 of 5. SEE decompiler.
Part 4 of 5. Breakpoint BYE, RS-232 terminal and remote access to MK-161.
Part 5 of 5. Concluding words.