Creating tools for researching NES games

    I decided to continue a series of publications about the internal structure of NES games, this time I will talk about the research tools I use.

    Most of what the researcher needs is already in the FCEUX emulator , which is well suited for debugging games. The documentation should thoroughly study the Debug section , each tool from there is useful to the researcher, and the ability to use them together with each other enhances the hacker's capabilities.

    However, I will not retell the documentation, but focus on cases where the emulator’s capabilities are not enough and you need to add new ones, or when there are unusual ways to find the desired directly in the ROM file, bypassing a long study of the game code.



    Using Lua Scripts


    Actually, the first method, an example of which is shown in the picture to attract attention, is to create auxiliary tools using the Lua script built into the interpreter emulator.
    In the example above, for the study of the game (and just cheating, if desired), this script feature is used, such as displaying images on the screen on top of the image drawn by the emulator .

    Thus, the researcher can notice what is inaccessible to the ordinary player, for example, in the screenshot above of the three hidden diamonds, the player can only jump to the first two, and in no way can take the third or even just guess about its existence. In “Duck Tales 2” there are even such jewels that are generally located outside the game level.

    Another example of a script that displays additional data on the screen is a compass to the nearest gem in the Jungle Book:



    Naturally, the visualization of information from RAM or ROM games is not the only possibility of scripts.

    Another frequently used feature is logging what is happening in the game code , for example, a script template for dumping unzipped data immediately after unpacking it (for SMD games, but the principle also applies for NES).

    Well, no one forbids creating full-fledged utilities on Lua scripts , such as the TasEditor keystroke editor already included in the emulator .

    Also, in my opinion, the idea of partially rewriting the game code in scripts is underestimatedwhen game data is patched by a script on the fly to modify the gameplay. Proof-of-concept of such a script modifying enemies in "New Ghostbusters 2":



    However, for complex processing of a specific game or the creation of new hack methods, it is worth thinking about using the following method.

    Modification of the source code of the emulator


    There is room for imagination on various topics that are not related to the study of games, such as adding support for achievements , 3d rendering or improved graphics to emulators , but I will try to stay within the scope of the article.

    One of the directions of expanding the emulator in order to improve the possibilities for reverse engineering is to throw as many of its internal capabilities as possible into Lua libraries . In the second article of the cycle, I already showed how, by scrolling through a whole pair of new functions, it became possible to make a universal (suitable for exploring any game) research tool.



    Another simple and useful example that is not yet available in the latest version of the emulator is the ability to modify PPU from a memory script .

    Modification of the emulator can also be used to embed an editor for a specific game in it with the ability to run it on the fly and check the changes made:



    Scripts for static analysis of game code


    The previous two categories of modifications concerned the dynamic analysis of the game during its execution. However, most of the research is a static analysis of the game's ROM file (or dumps of any data from it).

    The main program for this code analysis is the IDA Interactive Disassembler . It supports assembler 6502, however, it requires both a plug-in for the correct download of files in nes format and a set of scripts to automate routine actions to convert the downloaded file into combed code. A set of scripts specific to researching NES games is compiled here .

    The IDA scripts themselves can be written in the idc built-in command language orpython , in any case, it is best to open them with a text editor and learn, in most cases, it helps to better understand the IDA commands that are useful in working with it and learn how to write such scripts yourself. This is very useful when you need to carry out several hundred of the same type of actions, such as combining bytes into pointers or allocating arrays according to some rules.

    Tools for static analysis of game data


    The IDA is a good tool for code analysis, so good that some game research gurus even think that it’s enough to research and modify games. However, even with a game disassembled to compiled and commented sources, it’s difficult to modify game data - levels, graphic cards, character animations. Unfortunately, the game data format is often very different from game to game, so creating universal tools suitable for most games is quite difficult.

    Tile map editors


    The format for storing graphic banks (the lowest level of storing graphics) is standard for all NES games, so there are many tile map editors , however, among them I did not find a single library that would allow rendering these tiles in my application.

    Such programs can edit graphics tiles in games with the presence of CHR-ROM - whole banks of graphics. In other games, CHR-RAM is used - the video memory of the tiles in them is read in parts from the bank with data and code and copied to the video memory (it is sometimes quite tricky ways, but it’s better to talk about them in an article about data compression).

    At a higher level, the games are already so different that there are practically no common editing programs, there are at most editors covering several games on the same engine. I’ll write about my attempts to make a universal level editor at the end of the article, for now I’ll give you some more general ideas on how to find data in games and utilities that implement these ideas.

    As an implementation language, I use python for the fact that you can quickly and easily check some guess on it, sometimes even directly in interactive mode.

    Corrapt ROM


    Actually, just about this idea was the second article of the cycle - if you sort through all the possible options for changing one byte in ROM and see how it affects the screen, then this can help clarify the internal structure of the game. After that, it is even possible to make a simple version of the game editor - you need to prepare a set of top-level block pictures from which the screen is built, without going to the end how these pictures are built from ROM data and display the array of these pictures discovered by this method.

    Block search


    You can also go from the other side.

    The background that is displayed on the screen is set by an array of indexes of video memory tiles at a fixed PPU address - for NES there are 4 screens that can be displayed in different ways depending on the PPU settings. It doesn’t matter what exactly will be on the screen, just grab some loaded page for analysis.

    The first screen (Name Table) is located at PPU addresses $ 2000- $ 23BF. Its contents in the FCEUX emulator can be viewed in the Debug → Name Table Viewer window :



    As well as bytes in the Debug → Hex Editor window , View → PPU Memory (go to the address $ 2000).

    Here you can dump the entire video memory, which is useful to us for analysis ( File → Dump to File → PPU Memory ).

    This is just an array of 960 indexes of small 8x8 pixels video memory tiles. Moreover, after reversing a large number of games, it is known that game screens are often described in larger blocks, for example, 16x16 or 32x32 pixels. Thus, if we assume a certain block size (for a start we will try the most standard ones - 2x2 tiles, highlighted in the screenshot with a red frame), then we can divide the data from the screen into sections, each of which will contain a description of one block.

    This results in a list of all the blocks that are present on the screen. Moreover, we have “clean” descriptions of blocks, without information about character sprites (sprites are drawn in a different way), and independent of the animation (background animations are almost always done using palette or video memory changes, the tile numbers in the Name Table remain unchanged). However, we do not know the block numbers.

    We have a description of the blocks on the screen, but we do not know their storage order in ROM. Nevertheless, we can with some probability assume where exactly the description of the blocks is located. The algorithm for this is as follows:

    1. We go around the entire ROM and mark all the addresses where a block is detected, while maintaining its number (the real number may be different, it is important for us to note only the differences between the blocks from each other).

    2. We find the area in ROM in which the largest number of DIFFERENT blocks are found. With the greatest probability, this is precisely the description of the blocks.

    https://gist.github.com/spiiin/500262e8d9da86f10a093bbb41833360

    Thus, we can find 2x2 blocks in games in which they are stored sequentially.

    This is already not bad, but there is a way to radically improve the results of the algorithm. The fact is that there are a limited number of basic sizes of blocks and how to store them in ROM, and we can sort through them all.

    The main block sizes: 2x2, 4x2, 2x4 and 4x4, but if necessary, it is easy to add other sizes.

    With a way to store them in ROM a little trickier, blocks can be stored both linearly and broken into parts by arrays ( Structure of Arrays, abbreviated SoA), i.e. First, an array of only the first parts of the blocks is stored in ROM, followed by arrays with the following parts. Most often, such arrays are stored one after another, while the gap between the beginning of the arrays is equal to the number of blocks. To find such SoA arrays in ROM, we need to find out their length, which can be done by enumerating all the options (often games use 256 blocks each, so start checking from this number and gradually reduce it).

    All this looks rather confusing, because we rely only on the probability that the game uses a certain type of blocks, but in practice the utility finds blocks in 80-90% of tested games!



    https://github.com/spiiin/NesBlockFinder

    In addition, it allows you to weed out games with an unusual structure (non-block) in order to study them more closely.

    Comparison of CDL files


    The FCEUX emulator is able to emulate each instruction during emulation, which bytes were interpreted as code and which bytes as data (menu Debug → Code / Data Logger ... ). This feature is useful in itself and is tightly integrated with other debugging capabilities of the emulator - try turning this mode on and see how other debug windows have changed. However, I want to talk about its one particular application. If you save two such cdl files, one BEFORE the study action is performed, and the other immediately AFTER its completion, the difference between the two such files will show only the data (or code) that was used during the action. With proper cut-off, you can find the necessary data, just choosing the right two points in time between the measured events.

    File format is enoughit is simple , like a comparison script , but it can bring a lot of benefits, on it you can generally build a separate debugging methodology.

    Compressors / Decompressors


    This topic cannot be disclosed in a couple of paragraphs, and it will be too simplistic in the context of only NES games, so it deserves a separate article.

    Universal CadEditor Level Editor


    Actually, this program was originally created to display levels in the game “Chip & Dale” ( C hip A nd D ale Editor ), then, at the request, it was redone into an editor and eventually acquired support for other Capcom games (“Darkwing Duck”, “Duck” Tales 1-2 "," Tale Spin "," Little Mermaid ").



    Later it also became clear that the principles of block level construction in these games are very similar to the ways of organizing levels from many other games, but the devil is in the details - small differences in each game require describing the editor’s internal structures as flexibly as possible so that using combinations of these structures describe the level format for games for which it was not originally intended, without changing the core of the editor itself.

    A feature with which the editor’s versatility is provided is the so-called game configs . These are script files in the C # language, which describe how to load the data of a specific game. Why exactly C #? The editor was already written in this language and this made it easy to transfer code from the kernel to configs without changing it, which would have to be done if a more classical scripting language such as Lua were used.

    Using a full-fledged language instead of a simple settings file allows you to describe in your config your functions for loading and saving data of any required complexity. Scripts are ordinary text files, which allows users to create their own configs if necessary without recompiling the editor, using existing configs as templates. About 500 configs for 60 different games are bundled with the editor, about 100 of them are made by users of the editor without my participation, for games, some of which I have never even played:



    However, at the moment, despite attempts to make a universal editor, games are found that cannot be described without modifying the editor itself (however, many games can already be added this way). To collect information about such games with an unusual structure, I went a little further and began to use the editor itself as a library for Python , to examine the format of games before adding them to the editor and to test the correct understanding of building the level of a specific game. I implemented this in the form of Jupyter notebooks , due to the fact that it is convenient in them how to write code in interactive mode, so document it .

    Compiling large game structures from the basic tiles and assembling an entire level as a result resembles assembling a puzzle of thousands of pieces, and gives the same pleasure when, at last, each piece is in its place.

    The next article will not have such an abundance of technical information and I will give examples of assembling game levels with a non-standard structure or using unusual modifications of the standard block architecture. Also in the comments you can name the game on NES, the level format of which is interesting to you, maybe I will explore it too.

    Also popular now: