Modifications to machine code: “selection” vs. "Genetic Engineering"



    About the myths regarding GMOs have already been written here . In addition to this, I would like to share an excellent analogy with the analysis of real examples from the field of reverse engineering and modification of machine code, which will be understandable and close to programmers.

    Mutations of machine code


    As an example, let's take the NES prefix (known as Dendy), which uses the 6502 processor. The command system is very simple - the opcode is always represented by one byte, and each of 256 does something. No "protection" from the fool is provided, and almost any random set of bytes will be performed without resistance from the processor. Thus, we can take the ROM of some game, fix random bits in it (we will call it “mutations”) - and after starting to observe funny glitches in various unexpected places, but in general the game will most likely be workable. YouTube seems to have a whole genresimilar video. The machine code thus obtained is probably not very correct, but in most cases the processor will be able to execute it and do something.

    As it turned out, this technique is used not only for fun (playing familiar games with unexpected glitches is very funny), but also for getting quite specific modifications: they make a large number of “mutants” and look for the one in which the desired effect is manifested. It’s exactly the same as in modern selection methods, when the embryos of organisms are exposed to mutagens (which leads to random changes in the genetic code), and then those that have the desired trait are selected from what could grow. Organisms obtained in this way receive in the appendage a host of other undesirable mutations. They get rid of them by gradual crossing with a normal look, achieving a more or less sane organism with the desired trait and a minimum of other mutations that are noticeable. The same can be done with machine code.

    "Genetic Engineering" in machine code


    As a direct analogy, reverse engineering suggests itself here. We study machine (binary) code, understand how it works (in whole or only some of its parts), after which we make exactly those changes that will surely lead us to the target effect.

    Programmers have a huge number of tools for analyzing machine code (all kinds of disassemblers, debuggers, and the like). Despite the fact that machine code itself was not intended for human reading, it was invented by people and documented, so creating tools to convert it into a more understandable form (assembly listing) was not a problem. Also, almost any program under investigation was most likely written by a person, so what makes its code usually looks logical and understandable.

    It is clear that geneticists are much more complicated. There is no official documentation that could shed light on all emerging issues. The genetic code was “written” by chance and selected according to the principle “could it survive, adapt and give offspring”, so it’s surely far from logical and optimal, rather, even any obfuscation of the program code that we invented will seem like a mere babble. Nevertheless, this does not mean that it is impossible to study the genetic code and try to change it.

    Background


    For me, Dendy has always been more than just a prefix. I not only played it, but also spent considerable time inside it with a soldering iron in my hands for some simple modifications. On the way to somewhere, I often thought about how these games are created and how it works inside. Surely, many of you also once asked similar questions, such is the nature of future IT employees.

    Years passed. At some intervals, I immersed myself in the emu theme, learning everything new on thematic sites, but I did not dare to plunge into the study of assembler 6502 and NES architecture. The internal conflict of the rational and the irrational. I persuaded myself for a long time that I did not need to spend time on this, but ... broke. Looking at what interesting things the enthusiasts of the emu scene do, I took up my long-standing idea with a bright thought: “I can too!”.

    Surely many of you remember cartridges of the “9999-in-1” type, which were notable for a beautiful menu with a romantic plot by the sea, flying gulls and pleasant music (Unchained Melody). I disassembled this menu and made a tribute demo of Unchained Nostalgia based on it .



    The main idea is no list of games, and slides switch to music. In a free sky, I drew meaningful clouds and stars; implemented automatic slide switching synchronously with music and the ability to manually switch; added auto-correction of playback speed and pitch for systems other than Dendy (after all, this menu was made specifically for Chinese sourceclones, and they differed from the original NES, which makes the original menu work too fast on NES NTSC, and on NES PAL it turns out too low sound); I could not resist the temptation to add a number of curious Easter eggs. That is, the approach used does not limit the imagination in any way and allows, in principle, to make any changes for some finite time.

    However, similar ideas come to different people’s minds.


    After the release of my demo, I discovered that someone also tried to implement the same idea. The author indicated that he used the “corruptor”, which is usually used to create corrupted versions of games with various glitches.

    If someone had told me earlier that such a “random” method is actually used to modify machine code with the necessary effects, I would probably not have believed it. After all, for this you need a lot of time and luck. Although, on the other hand, there is no need to learn and understand machine code. Since I already had a “living” example of such a hack in my hands, there was nothing left but to investigate its work.



    Slides switch automatically, approximately every 2 seconds (which is too fast). There is no manual control. When changing a slide, the screen for some reason flickers twice. The main thing is that it works! But how?

    Traces of surgery


    In the cartridge, program code and tiles for graphics are stored separately. The code is in PRG ROM, the tiles are in CHR ROM. The second can be easily changed using standard tools, for this reason there are a lot of graphic hacks for famous games for Dendy, where the code is left absolutely unchanged.

    Compare the CHR ROM of the original menu and the investigated hack:

    On the face are traces of direct "surgical" intervention. For some reason, tiles of numbers, dots, and a small seagull were cut out. Let's try to return the original CHR ROM and see what exactly was hidden in this way.



    Now it’s a little clearer. The input was broken in the program code, because of which the menu thinks that the down button is always pressed. The output of the names of the games was also broken. Apparently, it did not work to break the output of game numbers and a small seagull (this is the cursor pointing to the selected game), so they had to be hidden by cutting the corresponding tiles in CHR ROM.

    ... and some magic!


    Digging deeper. With the help of a disassembler, let's see what the bytes “mutated” in PRG ROM do exactly.
    Original codeHack code
    NMI:
        PHA
        TXA
        PHA
        TYA
        PHA
        LDA $1E
        BEQ loc_C181
        JSR sub_C592
    loc_C181:
        LDA #0
        JSR sub_C428
    NMI:
        PHA
        TXA
        PHA
        TYA
        PHA
        LDA $1E
        BEQ loc_C181
        JSR sub_C592
    loc_C181:
        LDA #0
        JSR sub_C445
    This is an NMI interrupt handler that is called when the next frame is drawn. As you can see, the address of the called function has been fixed. The original function was engaged in reading the status of all buttons and writing their values ​​in one byte in the form of a bit mask. The new call executes only the last instruction in this function, which saves the result from register Y into a variable with the mask of the buttons pressed at the moment. Thus, the garbage value of register Y, which remains after the previous code is executed, is used. In most cases, the garbage value 0x07 appears here, which in binary form looks like 00000111 and corresponds to the Down, Left and Right buttons pressed simultaneously (the main code ignores the Left and Right buttons, processing the Down button) . However, immediately after switching the slide to Y, the garbage value 0xFF remains, which means that all the buttons were pressed at the same time, as a result of which the debugging code is called, which displays the revision number on the screen in the original menu (by Left + Start + B), which accompanied by disabling PPU for one frame. This causes a double flicker of the screen.
    Original codeHack code
        LDA #0
        STA $2000
        STA $2001
        LDA #$22
        STA $2006
        LDA #$AC
        STA $2006
        LDA #0
        STA $2000
        STA $2001
        SBC #$22
        STA $2006
        LDA #$AC
        STA $2006
    Here, the code for displaying the revision number on Left + Start + B is broken, so even though this debugging code is executed every time due to the problem described above, we still do not see the revision number on the screen. It turned out in a funny way: the instruction that set the value A to 0x22 in register A was replaced by an instruction that subtracts the number 0x22 from register A. Since register A is always 0 before, the result is 0xDE. The original code set the start of output for PPU at address 0x22AC in the video memory, which is within the first screen page. The new code, it turns out, sets the address 0xDEAC, which is far beyond the available video memory (the maximum allowed address is $ 3FFF), and as a result the inscription is displayed "nowhere."
    Original codeHack code
        ASL A
        TAX
        LDA $EAB3,X
        STA $0A
        LDA $EAB4,X
        STA $0B
        LDY #0
    loc_C380:
        LDA ($0A),Y
        BEQ loc_C38B
        STA $2007
        INY
        JMP loc_C380
        ASL A
        TAX
        LDA $EAB3,X
        STA $FA
        LDA $EAB4,X
        STA $0B
        LDY #0
    loc_C380:
        LDA ($FA),Y
        BEQ loc_C38B
        STA $2007
        INY
        JMP loc_C380
    Here we had a broken code that displays the names of the games. When writing a pointer to the output line, the low byte of the 16-bit pointer is written at 0xFA instead of 0x0A, and the high byte is written as it should at 0x0B. It was fortunate that 0xFA was not used, and it did not break anything further. The program then reads the line pointed to by a pair of bytes 0xFA and 0xFB (0xFB is always 0). By a fortunate coincidence, all the resulting pointers point to a memory filled with zeros, so lines with the name of the games are not displayed.

    conclusions


    You could feel how everything is "famously twisted" in the modifications created in a similar "random" way? One breakage props another breakdown, and somehow it generally gives a result close to what is needed. In this case, there are simply incredible dependencies between random and at first glance unrelated parts of the code, and with the slightest further modifications (even correct ones per se), everything may collapse. Full reverse engineering is much more reliable. The same can be said of genetic engineering, in comparison with traditional breeding.

    File links



    Also popular now: