Anti-debugging techniques at Sinclair Basic

    A screenshot from Rebelstar, a well-known Spectrum game

    I understand perfectly well that this article was 20 years late for commercials. Let the Spectrum not be released since 1992, but the army of fans of this platform has not decreased from year to year. Therefore, this article may be useful to researchers of programs written for the ZX Spectrum.



    0. Introduction
    Sinclair Basic was born in 1979 and was originally developed for the ZX80. A successful implementation and small size allowed him to migrate almost unchanged first to the ZX Spectrum 16 / 48k, and then to more advanced models of the ZX Spectrum line. This article is about anti-debugging techniques that have been widely used in basic programs.

    First of all, it is worth explaining that the program recorded in the Spectrum ROM does not contain any additional tools that help debugging. Therefore, the study of the program in BASIC corny boiled down to loading from a cassette, pressing the BREAK key (or SHIFT + spacebar) and methodically examining the source code using the LIST command. This worked at times, but to study most programs, at least minimal tools are required. Currently, we have emulators that include a disassembler, debugger, and a memory editor (for example, EmuZWin).

    1. Sinclair Basic inside and out
    There are not so many programs written directly in BASIC. By the way, when in 1994 I got a ZX Spectrum and five cassettes with games, then out of about a hundred games in BASIC, only one was written. But almost all other games had a small bootloader that loaded machine code blocks and transferred control. Just this bootloader was written in BASIC. Naturally, there were a lot of varieties of such loaders. We will analyze some of them below.

    The four-kilobyte size of the interpreter could not add a number of restrictions. The developers in the design and implementation process had to solve a lot of technical difficulties. The first thing that had to be abandoned was the parser. The creators of BASIC offered a very elegant solution - bytecode. Moreover, this bytecode is generated on the fly. There is a gain in processor time during the introduction of the team and the execution of the program.

    Each keyword in BASIC has its own code in the character generator. The table itself can be viewed on Wikipedia . It’s important for us to know that each keyword occupies exactly 1 byte in memory.

    Also worth paying attention to the five-byte format for representing numbers. Each five-byte record is preceded by a byte marker.0x0e. Integers between -65535 and 65535 are encoded by the following mask: 0x00 0x00 LSB MSB 0x00. You see that the sign of the integer does not appear here, it is taken from the textual representation of the number. But for our purposes, positive integers are quite sufficient.

    Each line of the program in BASIC has the following format: The line of the program contains data both for display on the screen and for calculation. For example, a line has the following representation in bytecode: And now the fun part begins. What can you do if you know the inner kitchen of Sinclair Basic? 2. String number zero.
    2 байта - номер строки (big endian)
    2 байта - длина строки программы в байтах (little endian, без учета первых двух полей)
    n байт - строка программы
    1 байт - 0x0d


    10 LET a=32768

    0x00 0x0a - строка 10
    0x0f 0x00 - длина равна 15 байт
    0xf1 - LET
    0x61 0x3d 0x33 0x32 0x37 0x36 0x36 - a=32768
    0x0e 0x00 0x00 0x00 0x80 0x00 - 32768 в пятибайтном формате
    0x0d - конец строки





    Line number 0 cannot be entered, no matter how hard we try. If it is, then it is visible in the listing, but it will not be possible to edit it. However, there is a system variable, which is called PROG in the documentation (in ZX Spectrum 48k it is located at 0x5c53), which stores the address of the program loaded into the memory in BASIC. And a very simple manipulation of this address allows you to change the number of the line. For example, like this: After starting this program in its listing, the first line will be number 0. 3. Unordered listing Based on the previous example, you can mix the line numbers as you like. There are absolutely no obstacles for line 1 to go after line 10, followed by line 600, etc. The program will be executed in the order in which the lines are placed in memory.
    10 LET addr=PEEK(23635) + (PEEK(23636) * 256)
    20 POKE addr, 0
    30 POKE addr + 1, 0






    4. We execute the machine code insolently.
    There is such a reserved word - USR . It allows you to execute a subroutine in memory. It cannot be used directly, but there are a great many workarounds. For example: You can load machine code using the keywords READ , POKE, and DATA . Here is one of these loaders found in the wild: In this fragment, the memory area used by the BASIC is limited to the address 24319. After that, the remaining memory can be freely used. In a cycle from the 20th to the 50th row, data from the 70th row is loaded. This data is recorded at address 24576, and then control is transferred there.
    RANDOMIZE USR addr
    PRINT USR addr
    LET a=USR addr


    1 REM FANTASY WORLD DIZZY
    10 CLEAR 24319
    20 FOR j=24576 TO 24594
    30 READ a
    40 POKE j,a
    50 NEXT j
    60 RANDOMIZE USR 24576
    70 DATA 17,0,1,221,33,198,92,62,255,55,205,86,5,212,0,0,195,198,92



    5. We hide the machine code in the comments
    Actually, nothing prevents to store the machine code in the comments. Of course, you won’t enter it from the keyboard. But you can first reserve enough memory in the line with the REM keyword , and then enter the machine code using the POKE command . I met this trick with the "disordered listing" trick. Here is how it was:
    28725 REM [тут 3 пробела для маскировки, а дальше машинный код]
    20 CLEAR 24499: BORDER 0: PAPER 0: INK 0: CLS: RANDOMIZE USR 23875

    This example is taken from the Dizzy 3.5 bootloader, converted to a file to run in the emulator. I don’t know if this code was on the disk, or it was added after the conversion, but the example is still pretty typical. For each program in BASIC, an entry point (line number from which execution begins) is set. In this case, it was set rigidly - 28725. In principle, there are few differences from the previous case, except that the machine code is already in memory, and you can simply transfer control to it.
    It is much more interesting where such an address for the transition came from. To make a universal loader, you need to take the value from the PROG variable, then add 8 (2 bytes to the line number, 2 to the length of the line, 1 to REM, 3 to spaces), and then go to the received address. But we know in what conditions the program will be launched. It is loaded from a floppy disk into TR-DOS, therefore, we have a drive. This means that PROG will point to 23867, not 23755 (there is a good document on TR-DOS variables ).

    6. It is written “Liverpool”, and read “Manchester”
    Remember, a little higher we looked at a five-byte format for storing numbers? This technique uses double notation of numbers in all its glory. The fact is that numbers are converted to a five-byte format and added to the bytecode by the base interpreter itself. But we can always change the display of the number in the listing or its value. There is a whole class of loaders that first loads the machine code into memory (for example, according to the 4th or 5th scenario), and then commits real suicide in a simple way: (if you are not aware, under normal conditions this command has an effect similar to rebooting the computer). But if you carefully look at the bytecode, then there will be something like
    RANDOMIZE USR 0




    0xf9 0xc0 0x30 - RANDOMIZE USR 0
    0x0e 0x00 0x00 0x43 0x5d 0x00 - отображаем ноль, а переход будет на 23875

    This trick works well against an inattentive code explorer if a noticeable zero is replaced with a more traditional address, at which there will be a more or less meaningful machine code.

    7. Conclusion
    Of course, this is far from all that can be said about tricks. You can come up with (and probably already thought up) a lot of ways to counteract code research, and some of them are described in this article. Also, information on anti-debugging techniques (not only in BASIC, but also for the ZX Spectrum in general) can be found in the interesting How to Hack on the ZX Spectrum guide . And do not be lazy to watch yourself from time to time that you are feeding the emulator, but the utilities from the site http://www.zxmodules.de/ will help you with this . I assure you that many interesting discoveries await you.

    UPD: Moved to the blog “World 8 bit”

    Also popular now: