The reverse of the Neuromancer. Part 1: Sprites


    It turned out that I am a big fan of the work of William Gibson . My acquaintance with this wonderful prose writer happened as a result of my deep passion for the aesthetics of cyberpunk and the subsequent appeal to the “roots” of the genre, which Gibson is considered to be the founding father (let him deny it in every possible way). The Cyberspace trilogy ( Sprawl trilogy, 1984-1988 ), namely, the opening trilogy novel Neuromancer ( Neuromancer, 1984 ), Gibson popularized the ideas of cyberspace, virtual reality and the global computer network (the term cyberspace itself was invented by the author himself and first introduced in the story "Burning Chrome, 1982 ), but the term became widespread only after the publication of " Neuromancer "). In addition to the obvious impact on pop culture ( Sonic Youth , U2 , Ghost in the Shell , The Matrix , Deus Ex , Shadowrun, and many more titles that somehow influenced the author), there is an opinion about Gibson's less obvious influence on the sphere of information technology . In the preface to the jubilee edition of The Neuromancer , American science fiction writer Jack Womack wonders:


    Could it be that Gibson’s vision of the global information space ultimately became the reason why the Internet today looks the way it looks and works the way it works?

    I am inclined towards a positive answer to this question.


    I learned about the igronization of Neuromancer at the very beginning of my acquaintance with Gibson's personality. The video game of the same name was developed by Interplay Productions ( Wasteland , Fallout ) and published by Mediagenic (now known as Activision ) in 1988 on Amiga , Apple II , Apple IIGS , Commodore 64 and MS-DOS platforms . At one time, the game was warmly received by the press and the gaming community, and in 1996 was included by Computer Game Magazine in the list of 150 best video games of all time. Obsession - "porting this game to Win64, "appeared in my head during the next launch of Neuromancer under DosBox . Having received the first results from its implementation, I decided to document my actions in the form of an article (in the future, I hope, a series of articles) that you see in front of you.


    I don’t think I can bring this work to the end. However, at the moment, I intend to continue, because for me it is:


    • ideal platform for training reverse engineering skills
    • great opportunity to increase assembly proficiency
    • how time-consuming, just as addictive and exciting process

    I hope my experience will be useful to you in one way or another. In the end, given the contents of the original, do you find the Neuromancer reverse engineering process itself somewhat ironic? :)




    The Neuromant distribution package that came to me includes one 16-bit executable in the MZ - format NEURO.EXE, and two binaries - NEURO1.DATand NEURO2.DAT(probably - resources). Having loaded the “patient” into the IDA, I immediately received a warning “ Possibly packed file ” followed sp-analysis failedby a clear indication that the file is either compressed or encrypted, or both. But rather the first. In an era of struggle for every byte of disk space, compression is a must. In a few minutes, googled the UNP dos utility . Having run the executable through it, he confirmed his hypothesis of compression:


    > С:\UNP411\UNP.EXE C:\NEURO\NEURO.EXE
    UNP 4.11 Executable file restore utility, written by Ben Castricum, 05/30/95
    processing file : C:\NEURO\NEURO.EXE
    DOS file size   : 94921
    file-structure  : executable (EXE)
    EXE part sizes  : header 512, image 94409, overlay 0 bytes
    processed with  : EXEPACK V4.05 or V4.06
    action          : decompressing... done
    new file size   : 313376
    writing to file : C:\NEURO\NEURO.EXE

    The first layer of "ice" is broken, and now the IDA works out silently and without errors.




    I look at what I'm dealing with. [ FLASHFORWARD: then I did not think about the fact that you can use binary file visualization programs for a superficial study. Having tested them now, I understand that I could save myself a lot of time. According to the unpacked executable service binvis.io paints a picture that you see on the left (clickable image).]


    Quickly going over the IDA, I estimate that the code itself occupies about 1/5 of the total volume (~ 64k, the upper part in the image). According to the code, system calls MS-DOS ( int 21) and calls to various standard Sishn functions are regularly found . In the data segment, I find a large number of readable lines (blue sectors at the bottom). The vast majority of the binary is occupied by zeros (black sectors) - most likely, statically allocated working arrays and structures.


    In general, everything is cool! Taking into account recognized system and library calls, many of which are associated with file I / O, I decided to start by reversing the resource management system.




    Tracing the procedures that open file descriptors, I came across a function that turned out to be the key to understanding the structure of .DATgame files. [The obfuscated disassembly product is far from the most pleasant-looking thing, but, nevertheless, sometimes I will accompany my words with a code] . The function determines the number, opens the descriptor ( _dos_open) and sets the read / write offset ( lseek) of the file containing the resource, the name of which is passed to the function as a pointer to a null-terminated string. Let's look at one of the locate_resource_in_datchains of calls leading to this function ( ):


        ...
        mov     ax, 4CD2h
        push    ds
        push    ax
        mov     ax, 505Eh
        push    ax                      ; resource_name
        call    sub_127DA               ; sub_127DA(char *resource_name, void *, void *)
        ...
    sub_127DA proc:
        ...
        arg_0 = word ptr  4             ; resource_name
        arg_2 = word ptr  6
        arg_4 = word ptr  8
        push    bp
        mov     bp, sp
        sub     sp, 6
        ...
        push    [bp+arg_0]              ; resource_name
        call    locate_resource_in_dat  ; locate_resource_in_dat(char *resource_name)
        ...

    In the data segment, the address 0x505Eis the line:


    dseg:505E aConfigNmc      db 'config.nmc',0

    The belonging of a resource to a specific .DAT-file is determined inside the function locate_resource_in_datby comparing this static string ( sub_127DA("config.nmc", ...)) with the rows located in the data segment every 22 ( 0x16) bytes, starting from explicitly specified addresses. Here's what one of them is:


    dseg:00A2                 db 'R1.BIH', 0
    dseg:00A9                 db 0
    dseg:00AA                 db 0
    dseg:00AB                 db 0
    dseg:00AC                 db 0
    dseg:00AD                 db 0
    dseg:00AE                 db 0
    dseg:00AF                 db 0
    dseg:00B0                 db 0
    dseg:00B1                 db 0
    dseg:00B2                 db 0
    dseg:00B3                 db 0
    dseg:00B4                 db 0EEh
    dseg:00B5                 db 5
    dseg:00B6                 db 0
    dseg:00B7                 db 0
    dseg:00B8                 db 'R1.PIC', 0
    dseg:00BF                 db 0
    dseg:00C0                 db 0
    dseg:00C1                 db 0
    dseg:00C2                 db 0
    dseg:00C3                 db 0
    dseg:00C4                 db 0
    dseg:00C5                 db 0
    dseg:00C6                 db 0EEh
    dseg:00C7                 db 5
    dseg:00C8                 db 0
    dseg:00C9                 db 0
    dseg:00CA                 db 046h
    dseg:00CB                 db 013h
    dseg:00CC                 db 0
    dseg:00CD                 db 0
    dseg:00CE                 db 'R1.ANH', 0
                              ...

    It looks like an array of view structures:


    struct resource_t
    {
        char name[14]; // имя ресурса
        int offset;    // смещение от начала файла
        int size;      // байтовый размер ресурса
    };
    struct resource_t resources[] = {
        { "R1.BIH",     0,  0x5EE },
        { "R1.PIC", 0x5EE, 0x1346 },
        ...
    };

    There were two such arrays. One per .DATfile. They are located at addresses 0xA2and 0x81Cconsist of 87 and 151 elements, respectively. In total I counted 8 different resource types (by extension names): .BIH, .PIC, .ANH, .BIN, .IMH, .NMC, .TXH, .SAV. Names with extensions .NMCand .SAVoccur only once and speak for themselves: CONFIG.NMCand SAVEGAME.SAV. Descriptive names also have ests with the expansion .IMH: CURSORS.IMH, SPRITES.IMH, TITLE.IMHand others. With the rest is not so obvious. There is an assumption that .TXH- this is text, but .PIC- images. [Now I’m almost sure that the .PICbacks of the locations are stored]. The content of the remaining types is still unknown.




    While walking around the code under the debugger, I accidentally figured out the purpose of the .BINresources. It all started with the fact that in the listing I could not find a place where the 256-color VGA mode ( mode 0x13 ) would be initialized . However, the debugger output in DosBox showed that, and, roughly, when this happens:


    43597504: INT10:Set Video Mode 13
    43597504: VGA:Blinking 0   

    Having traced the program up to this point, I found out that the code that directly performs these actions is loaded into memory from the resource TVGA.BIN, and the program simply does it callat the address to which it is loaded. Thus, they .BINcontain some dynamically loaded, compiled code. Something like dynamic libraries, but not really. And it’s not yet clear why this was done that way. [So I thought, until I discussed this with a practitioner in those days. It turned out that this is a completely standard approach for DOS, which was widely used to reduce the volume of executable files.]




    I wanted to get some tangible result. I decided to try to promote the format .IMH, because judging by the names with this extension ( SPRITES.IMH) - this is graphics. Of all the resources of this type, I selected one of the minimum size and saved it in a separate file:


    CURSORS.IMH: 243 байта
    0x0000:     0001 0203 0400 060D 0009 0A0B 0000 000F
    0x0010:     0001 0203 0405 080B 0809 0A0B 0C0D 0E0F
    0x0020:     3F01 0000 0440 7EEC D0C2 2D0A 46A3 3FF2
    0x0030:     7111 6230 640E 104C E0C5 8505 FCFF 5001
    0x0040:     04C0 260F E2C0 85C2 5017 FA05 54B0 6C2D
    0x0050:     7413 E9C7 451D D8B6 2C18 E714 7B8B D79B
    0x0060:     BBE5 EB60 4D2F EF70 3A1E 42FE D9C0 5DC0
    0x0070:     EBCD EE07 B9BF 821C 0141 BB15 360B 743A
    0x0080:     4BC8 749E 45D7 5B26 63F4 24E9 DAEA 5CC6
    0x0090:     E859 D793 41F7 94DF B7AC 4DB7 EFC2 CC4F
    0x00A0:     5D5F 66D1 E5E3 3F2B AC42 7C5E 3AF1 9F95
    0x00B0:     D7D9 E8E9 3D75 62DE 9B05 86E6 0EBF 794C
    0x00C0:     1D0B 3A76 9A97 31BA 1274 ED75 2663 FE79
    0x00D0:     175F BC87 FD58 9B6F DD0B 12F2 652F 12FB
    0x00E0:     1E99 5E37 B24C 424C 4F4C AF1B 6CC4 BEC7
    0x00F0:     C99F C0

    This is definitely not a bitmap, which means you need to figure out options for what it can be [in order of increasing complexity] :


    • this is a compressed image of any known format
    • this is data compressed by some well-known general-purpose algorithm
    • this is data compressed by some proprietary (for Interplay ) algorithm

    The first option dropped quickly enough. Having tried with a dozen different extensions and viewers, I still could not open this file. I got to the exotic - I googled that images with the extension .imhare used in a special astronomical (literally - they are used by astronomers) IRAF (Image Reduction and Analysis Facility) software package . But this is past.


    At this stage, for me there was not much difference between the second and third options. Of course, knowing a specific algorithm, it would be much easier to implement it in a high-level language or, better yet, take a ready-made implementation. However, not being an expert (and not even an amateur) in the field of data compression, it is extremely difficult to identify the algorithm by disassembled code or by some signatures. In any case, the task was to select a code section in the IDA responsible for data decompression and adapt it to 64-bit Assembler (in my case, MASM ).


    And I got lucky. Mine is CURSORS.IMHloaded in main-e one of the first:


        ...
        mov     ax, 2
        mov     dx, seg seg009
        push    dx
        push    ax
        mov     ax, 506Ah       ; "cursors.imh"
        push    ax
        call    sub_126CB
        ...

    This was followed by a long and persistent trace of the function sub_126CB. But the game was worth the candle. The result of this function is detected at the address stored in dx:


    0x0000:     0000 0000 0800 0A00 0666 6666 6666 6600
    0x0010:     6777 7777 7777 7760 6777 7777 7766 7776
    0x0020:     0666 6676 6677 7776 0006 7767 7777 7766
    0x0030:     0000 6667 7777 7676 0000 6776 6666 6776
    0x0040:     0000 0666 6667 7766 0000 0067 7677 7660
    0x0050:     0000 0006 6666 6600 0400 0000 0600 0C00
    0x0060:     0000 3000 0000 0003 B300 0000 003B BB30
    0x0070:     0000 03BB BBB3 0000 3BBB BBBB 3000 333B
    0x0080:     BB33 3000 003B BB30 0000 003B BB30 0000
    0x0090:     003B BB30 0000 003B BB30 0000 003B BB30
    0x00A0:     0000 0033 3330 0000 0B00 0400 0600 0900
    0x00B0:     0000 0033 0000 0000 003B 3000 3333 333B
    0x00C0:     B300 3BBB BBBB BB30 3BBB BBBB BBB3 3BBB
    0x00D0:     BBBB BB30 3333 333B B300 0000 003B 3000
    0x00E0:     0000 0033 0000 0400 0B00 0600 0C00 0033
    0x00F0:     3330 0000 003B BB30 0000 003B BB30 0000
    0x0100:     003B BB30 0000 003B BB30 0000 003B BB30
    0x0110:     0000 333B BB33 3000 3BBB BBBB 3000 03BB
    0x0120:     BBB3 0000 003B BB30 0000 0003 B300 0000
    0x0130:     0000 3000 0000 0000 0400 0600 0900 0000
    0x0140:     3300 0000 0003 B300 0000 003B B333 3333
    0x0150:     03BB BBBB BBB3 3BBB BBBB BBB3 03BB BBBB
    0x0160:     BBB3 003B B333 3333 0003 B300 0000 0000
    0x0170:     3300 0000

    Now this looks like a bitmap! And if this is true, by twisting this data, I should get a picture. I try to make rows using different widths - to no avail. I draw attention to the values ​​of the 5th ( 0x08) and 7th ( 0x0A) bytes. Let's say it's width and height, we multiply, count the following 80 bytes ( 0x08 * 0x0A = 50 (80)):


    0x0008:     0666 6666 6666 6600 6777 7777 7777 7760
    0x0018:     6777 7777 7766 7776 0666 6676 6677 7776
    0x0028:     0006 7767 7777 7766 0000 6667 7777 7676
    0x0038:     0000 6776 6666 6776 0000 0666 6667 7766
    0x0048:     0000 0067 7677 7660 0000 0006 6666 6600

    We compose so that there would be 8 bytes in a row, and there would be 10 rows in total:


    0666 6666 6666 6600
    6777 7777 7777 7760
    6777 7777 7766 7776
    0666 6676 6677 7776
    0006 7767 7777 7766
    0000 6667 7777 7676
    0000 6776 6666 6776
    0000 0666 6667 7766
    0000 0067 7677 7660
    0000 0006 6666 6600

    This is it! [I knew this because I saw how the cursor looked on the screen] . I try to wrap this data in a .BMPheading with a standard palette - it draws something, but not at all what I expected. I put a breakpoint in the debugger on the name of the state of the video bytes ( 0xA000+) to see what is actually displayed on the screen. I understand that you need to add zeros [do a search in the browser for the value “06” and, in the array below, you will see something that looks like a palm with a protruding index finger] :


    00 06 06 06 06 06 06 06 06 06 06 06 06 06 00 00
    06 07 07 07 07 07 07 07 07 07 07 07 07 07 06 00
    06 07 07 07 07 07 07 07 07 07 06 06 07 07 07 06
    00 06 06 06 06 06 07 06 06 06 07 07 07 07 07 06
    00 00 00 06 07 07 06 07 07 07 07 07 07 07 06 06
    00 00 00 00 06 06 06 07 07 07 07 07 07 06 07 06
    00 00 00 00 06 07 07 06 06 06 06 06 06 07 07 06
    00 00 00 00 00 06 06 06 06 06 06 07 07 07 06 06
    00 00 00 00 00 00 06 07 07 06 07 07 07 06 06 00
    00 00 00 00 00 00 00 06 06 06 06 06 06 06 00 00



    I wrap this (and the remaining in the original array) data in BMPand get the following pictures on the output (here, slightly enlarged):





    Обнаружилось, что первые 20 байт CURSORS.IMH[и остальных .IMH] загружаются в другую область памяти и не учавствуют в декомпрессии. Сперва подумал, что это палитра, но нет, корректные цвета получаются с использованием стандартной (на рисунке выше). [В коде я нашёл место, в котором происходит обращение к этой памяти, но пока не разбирался, что именно там происходит].


    Переписал функцию sub_126CB, всего вышло около 800 строк MASM-а. Но оно работает, и теперь я могу распаковать любой .IMH-ресурс. Иллюстрация в заголовке, кстати, оттуда же (TITLE.IMH). А вот спрайты главного героя (SPRITES.IMH):



    Да, я могу распаковать любой.IMH-ресурс, но не все сразу. При переносе кода на 64-битный Ассемблер я допустил ошибки связанные с переносом разрядов при сложении адресов, из-за этого, даже на одном ресурсе, программа работает через раз. Но это поправимо, нужно лишь переписать этот код так, как если бы я изначально проектировал его под 64-битную платформу. В идеале, переписать его на Си, тогда эти ошибки упразднятся сами собой. Однако, по тому, что я увидел, у меня никак не складывался алгоритм, а переписывая обфусцированный Ассемблер на Си, есть риск получить ещё более обфусцированный Си. Мне нужна была помощь. Сняв данные в ключевых точках, я опубликовал вопрос на Reverse Engineering StackExchange в надежде, что по этим данным кто-нибудь сможет определить — с каким-именно алгоритмом сжатия я имею дело.


    Need help in compression algorithm identification

    Currently I'm in reversing an old MS-DOS Neuromancer game (Interplay Productions, 1988. Based on William Gibson's novel). For now I have already written an utility that parses game resources and extracts sprites. Sprites are bitmaps that compressed with some fancy algorithm. I have restored that algorithm by porting it's 16-bit disassembly to 64-bit assembly (I'm working on Windows 10 and using MASM in MSVS 2017). It results in ~800 lines of pretty obfuscated assembly (which became even more obfuscated after porting it to 64-bit) that works and I'm about rewriting it in plain old C. However it makes no sense because i can't identify an actual algorithm.


    And here is the problem. I assume that the game uses some widly known data compression algorithm. I will provide an example by posting chunks of data that are observable during the decompression process. I hope that there are some compression experts who will recognize the algorithm (if my theory about "widly known algorithm" is correct). Thank you in advance!




    Here we go, decompressing CURSORS.IMH file that contains a series of bitmaps that shows diffrent in-game cursors:


    SRC: Initial data, 211 bytes stored in 512 byte buffer:
    0x0000  3f 01 00 00 04 40 7e ec d0 c2 2d 0a 46 a3 3f f2
    0x0010  71 11 62 30 64 0e 10 4c e0 c5 85 05 fc ff 50 01
    0x0020  04 c0 26 0f e2 c0 85 c2 50 17 fa 05 54 b0 6c 2d
    0x0030  74 13 e9 c7 45 1d d8 b6 2c 18 e7 14 7b 8b d7 9b
    0x0040  bb e5 eb 60 4d 2f ef 70 3a 1e 42 fe d9 c0 5d c0
    0x0050  eb cd ee 07 b9 bf 82 1c 01 41 bb 15 36 0b 74 3a
    0x0060  4b c8 74 9e 45 d7 5b 26 63 f4 24 e9 da ea 5c c6
    0x0070  e8 59 d7 93 41 f7 94 df b7 ac 4d b7 ef c2 cc 4f
    0x0080  5d 5f 66 d1 e5 e3 3f 2b ac 42 7c 5e 3a f1 9f 95
    0x0090  d7 d9 e8 e9 3d 75 62 de 9b 05 86 e6 0e bf 79 4c
    0x00A0  1d 0b 3a 76 9a 97 31 ba 12 74 ed 75 26 63 fe 79
    0x00B0  17 5f bc 87 fd 58 9b 6f dd 0b 12 f2 65 2f 12 fb
    0x00C0  1e 99 5e 37 b2 4c 42 4c 4f 4c af 1b 6c c4 be c7
    0x00D0  c9 9f c0

    Processing starts from the fist byte of SRC and suspends on 44th byte. Here is the intermediate result of that processing stored in 2048 byte buffer located right after the SRC buffer (actually, there are 10 bytes between them, where among other things an address of 44th byte is stored):


    0x0200  00 80 01 00 5c ea 3f 01 00 00
    0x020A  02 00 02 00 0e 00 04 00 1f 00 05 00 04 00 04 00
    0x021A  18 00 05 00 3a 00 07 00 0d 00 05 00 00 00 00 00
    0x022A  05 00 04 00 df 00 08 00 0d 00 08 00 de 00 08 00
    0x023A  05 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x024A  00 00 05 00 05 00 05 00 00 00 00 00 00 00 00 00
    0x025A  00 00 00 00 00 00 00 00 0c 00 08 00 00 00 00 00
    0x026A  00 00 00 00 0f 00 08 00 0e 00 08 00 00 00 00 00
    ... ZEROES ...
    0x02CA  1a 00 05 00 00 00 00 00 00 00 00 00 0c 00 05 00
    0x02DA  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x02EA  04 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00
    ... ZEROES ...
    0x038A  6e 00 07 00 1c 00 06 00 00 00 00 00 00 00 00 00
    0x039A  00 00 00 00 00 00 00 00 09 00 08 00 00 00 00 00
    ... ZEROES ...
    0x040A  19 00 05 00 00 00 00 00 00 00 00 00 07 00 05 00
    0x041A  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x042A  06 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00
    ... ZEROES ...
    0x05EA  00 00 00 00 3b 00 07 00 00 00 00 00 08 00 08 00
    0x05FA  36 00 06 00 0f 00 05 00 1e 00 05 00 01 00 04 00
    0x060A  05 10 00 00 05 10 00 00 05 10 00 00 05 10 00 00
    0x061A  05 10 00 00 05 10 00 00 05 10 00 00 05 10 00 00
    0x062A  08 fb 00 00 08 66 00 00 07 0c 00 00 07 0c 00 00
    0x063A  08 16 00 00 08 0a 00 00 08 1a 00 00 08 19 00 00
    0x064A  04 ff 00 00 04 ff 00 00 04 ff 00 00 04 ff 00 00
    0x065A  04 ff 00 00 04 ff 00 00 04 ff 00 00 04 ff 00 00
    0x066A  04 ff 00 00 04 ff 00 00 04 ff 00 00 04 ff 00 00
    0x067A  04 ff 00 00 04 ff 00 00 04 ff 00 00 04 ff 00 00
    0x068A  05 38 00 00 05 38 00 00 05 38 00 00 05 38 00 00
    0x069A  05 38 00 00 05 38 00 00 05 38 00 00 05 38 00 00
    0x06AA  05 11 00 00 05 11 00 00 05 11 00 00 05 11 00 00
    0x06BA  05 11 00 00 05 11 00 00 05 11 00 00 05 11 00 00
    0x06CA  05 88 00 00 05 88 00 00 05 88 00 00 05 88 00 00
    0x06DA  05 88 00 00 05 88 00 00 05 88 00 00 05 88 00 00
    0x06EA  05 83 00 00 05 83 00 00 05 83 00 00 05 83 00 00
    0x06FA  05 83 00 00 05 83 00 00 05 83 00 00 05 83 00 00
    0x070A  04 03 00 00 04 03 00 00 04 03 00 00 04 03 00 00
    0x071A  04 03 00 00 04 03 00 00 04 03 00 00 04 03 00 00
    0x072A  04 03 00 00 04 03 00 00 04 03 00 00 04 03 00 00
    0x073A  04 03 00 00 04 03 00 00 04 03 00 00 04 03 00 00
    0x074A  04 08 00 00 04 08 00 00 04 08 00 00 04 08 00 00
    0x075A  04 08 00 00 04 08 00 00 04 08 00 00 04 08 00 00
    0x076A  04 08 00 00 04 08 00 00 04 08 00 00 04 08 00 00
    0x077A  04 08 00 00 04 08 00 00 04 08 00 00 04 08 00 00
    0x078A  05 33 00 00 05 33 00 00 05 33 00 00 05 33 00 00
    0x079A  05 33 00 00 05 33 00 00 05 33 00 00 05 33 00 00
    0x07AA  05 06 00 00 05 06 00 00 05 06 00 00 05 06 00 00
    0x07BA  05 06 00 00 05 06 00 00 05 06 00 00 05 06 00 00
    0x07CA  06 61 00 00 06 61 00 00 06 61 00 00 06 61 00 00
    0x07DA  07 05 00 00 07 05 00 00 07 f9 00 00 07 f9 00 00
    0x07EA  05 fd 00 00 05 fd 00 00 05 fd 00 00 05 fd 00 00
    0x07FA  05 fd 00 00 05 fd 00 00 05 fd 00 00 05 fd 00 00
    0x080A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x081A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x082A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x083A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x084A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x085A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x086A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x087A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x088A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x089A  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x08AA  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x08BA  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x08CA  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x08DA  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x08EA  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x08FA  02 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00
    0x090A  05 04 00 00 05 04 00 00 05 04 00 00 05 04 00 00
    0x091A  05 04 00 00 05 04 00 00 05 04 00 00 05 04 00 00
    0x092A  05 80 00 00 05 80 00 00 05 80 00 00 05 80 00 00
    0x093A  05 80 00 00 05 80 00 00 05 80 00 00 05 80 00 00
    0x094A  05 30 00 00 05 30 00 00 05 30 00 00 05 30 00 00
    0x095A  05 30 00 00 05 30 00 00 05 30 00 00 05 30 00 00
    0x096A  06 fc 00 00 06 fc 00 00 06 fc 00 00 06 fc 00 00
    0x097A  07 60 00 00 07 60 00 00 08 0b 00 00 08 09 00 00
    0x098A  04 01 00 00 04 01 00 00 04 01 00 00 04 01 00 00
    0x099A  04 01 00 00 04 01 00 00 04 01 00 00 04 01 00 00
    0x09AA  04 01 00 00 04 01 00 00 04 01 00 00 04 01 00 00
    0x09BA  04 01 00 00 04 01 00 00 04 01 00 00 04 01 00 00
    0x09CA  05 fe 00 00 05 fe 00 00 05 fe 00 00 05 fe 00 00
    0x09DA  05 fe 00 00 05 fe 00 00 05 fe 00 00 05 fe 00 00
    0x09EA  05 02 00 00 05 02 00 00 05 02 00 00 05 02 00 00
    0x09FA  05 02 00 00 05 02 00 00 05 02 00 00 05 02 00 00

    Processing continues from 44th byte of SRC. The remaining bytes are processed with result stored in external intemediate buffer:


    IMM: Intermediate data, 319 bytes:
    0x0000  00 00 00 00 08 00 0a 00 ff 06 05 66 fe 00 61 05
    0x0010  11 ff 60 04 00 fc 11 00 16 61 01 11 ff 01 01 11
    0x0020  01 00 fe 06 60 02 11 01 00 fc 10 00 06 11 02 00
    0x0030  fe 01 10 01 00 ff 01 03 11 02 00 fc 61 10 00 01
    0x0040  01 10 01 00 fe 06 01 01 10 fe 01 06 02 00 fb 61
    0x0050  10 11 10 60 04 00 00 00 06 00 0c 00 01 00 ff 30
    0x0060  03 00 fe 03 83 03 00 fd 38 08 30 01 00 fc 03 80
    0x0070  00 83 01 00 ff 38 01 00 f9 08 30 00 08 80 00 88
    0x0080  01 00 ff 33 01 00 fe 03 30 19 00 fe 08 88 02 00
    0x0090  0b 00 04 00 06 00 09 00 02 00 ff 33 04 00 fd 08
    0x00A0  30 00 02 33 fc 00 83 00 08 01 88 fd 80 08 30 04
    0x00B0  00 ff 83 04 00 fe 83 08 01 88 fd 80 08 30 02 33
    0x00C0  fe 00 83 03 00 fd 08 30 00 04 00 0b 00 06 00 0c
    0x00D0  00 ff 00 01 33 ff 30 02 00 fe 08 88 1a 00 ff 33
    0x00E0  01 00 f9 03 30 00 08 80 00 88 01 00 ff 38 01 00
    0x00F0  f9 08 30 00 03 80 00 83 02 00 fd 38 08 30 02 00
    0x0100  fe 03 83 02 00 00 00 04 00 06 00 09 00 01 00 ff
    0x0110  33 03 00 fe 03 80 03 00 fe 38 00 02 33 fd 03 80
    0x0120  08 01 88 fe 80 38 04 00 ff 38 04 00 fd 03 80 08
    0x0130  01 88 fc 80 00 38 00 02 33 fd 00 03 80 02 00

    Finally, some processing performed on IMM and we have the result:


    DST: The Result, 372 bytes:
    0x0000  00 00 00 00 08 00 0A 00 06 66 66 66 66 66 66 00 
    0x0010  67 77 77 77 77 77 77 60 67 77 77 77 77 66 77 76
    0x0020  06 66 66 76 66 77 77 76 00 06 77 67 77 77 77 66
    0x0030  00 00 66 67 77 77 76 76 00 00 67 76 66 66 67 76
    0x0040  00 00 06 66 66 67 77 66 00 00 00 67 76 77 76 60
    0x0050  00 00 00 06 66 66 66 00 04 00 00 00 06 00 0C 00
    0x0060  00 00 30 00 00 00 00 03 B3 00 00 00 00 3B BB 30
    0x0070  00 00 03 BB BB B3 00 00 3B BB BB BB 30 00 33 3B
    0x0080  BB 33 30 00 00 3B BB 30 00 00 00 3B BB 30 00 00
    0x0090  00 3B BB 30 00 00 00 3B BB 30 00 00 00 3B BB 30
    0x00A0  00 00 00 33 33 30 00 00 0B 00 04 00 06 00 09 00
    0x00B0  00 00 00 33 00 00 00 00 00 3B 30 00 33 33 33 3B
    0x00C0  B3 00 3B BB BB BB BB 30 3B BB BB BB BB B3 3B BB
    0x00D0  BB BB BB 30 33 33 33 3B B3 00 00 00 00 3B 30 00
    0x00E0  00 00 00 33 00 00 04 00 0B 00 06 00 0C 00 00 33
    0x00F0  33 30 00 00 00 3B BB 30 00 00 00 3B BB 30 00 00
    0x0100  00 3B BB 30 00 00 00 3B BB 30 00 00 00 3B BB 30
    0x0110  00 00 33 3B BB 33 30 00 3B BB BB BB 30 00 03 BB
    0x0120  BB B3 00 00 00 3B BB 30 00 00 00 03 B3 00 00 00
    0x0130  00 00 30 00 00 00 00 00 04 00 06 00 09 00 00 00
    0x0140  33 00 00 00 00 03 B3 00 00 00 00 3B B3 33 33 33
    0x0150  03 BB BB BB BB B3 3B BB BB BB BB B3 03 BB BB BB
    0x0160  BB B3 00 3B B3 33 33 33 00 03 B3 00 00 00 00 00
    0x0170  33 00 00 00

    Decompression completed and it's easy to derive bitmaps like this:


    1. Get first 8 bytes of DST: 00 00 00 00 08 00 0A 00
    2. Multiply last 2 words: 08 * 0A = 50 (80)
    3. Get next 80 bytes of DST :


      06 66 66 66 66 66 66 00 67 77 77 77 77 77 77 60
      67 77 77 77 77 66 77 76 06 66 66 76 66 77 77 76
      00 06 77 67 77 77 77 66 00 00 66 67 77 77 76 76
      00 00 67 76 66 66 67 76 00 00 06 66 66 67 77 66
      00 00 00 67 76 77 76 60 00 00 00 06 66 66 66 00

    4. Arrange those bytes assuming that 08 and 0A are width and height respectively:


      06 66 66 66 66 66 66 00
      67 77 77 77 77 77 77 60
      67 77 77 77 77 66 77 76
      06 66 66 76 66 77 77 76
      00 06 77 67 77 77 77 66
      00 00 66 67 77 77 76 76
      00 00 67 76 66 66 67 76
      00 00 06 66 66 67 77 66
      00 00 00 67 76 77 76 60
      00 00 00 06 66 66 66 00

    5. Extend this with zeroes:


      00 06 06 06 06 06 06 06 06 06 06 06 06 06 00 00
      06 07 07 07 07 07 07 07 07 07 07 07 07 07 06 00
      06 07 07 07 07 07 07 07 07 07 06 06 07 07 07 06
      00 06 06 06 06 06 07 06 06 06 07 07 07 07 07 06
      00 00 00 06 07 07 06 07 07 07 07 07 07 07 06 06
      00 00 00 00 06 06 06 07 07 07 07 07 07 06 07 06
      00 00 00 00 06 07 07 06 06 06 06 06 06 07 07 06
      00 00 00 00 00 06 06 06 06 06 06 07 07 07 06 06
      00 00 00 00 00 00 06 07 07 06 07 07 07 06 06 00
      00 00 00 00 00 00 00 06 06 06 06 06 06 06 00 00


    Thats it! Wrapping it in BMP header gives us a cute cursor image.


    Мне опять повезло! Оказалось, что там последовательно применяются два алгоритма. Первый — некий неизвестный алгоритм сжатия, а второй — разновидность Run-Length кодирования [здесь, я не буду расписывать какая именно — это я сделал в ответе на свой же вопрос на Reverse Exchange]. В итоге, вторая часть, вместо ~400 строк Ассемблера, уместилась в 50 строк на Си:


    decode_rle
    typedef struct rle_hdr_t
    {
        uint32_t unknown;
        uint16_t width;
        uint16_t height;
    } rle_hdr_t;
    static int decode_rle(uint8_t *_src, uint32_t len, uint8_t *_dst)
    {
        uint32_t total_len = 0;
        uint8_t *src = _src, *dst = _dst;
        rle_hdr_t *rle;
        while (len)
        {
            uint32_t bm_size = 0, bm_width = 0, bm_height = 0;
            uint8_t *p = NULL;
            rle = (rle_hdr_t*)src;
            bm_width = rle->width;
            bm_height = rle->height;
            bm_size = bm_width * bm_height;
            memmove(dst, src, sizeof(rle_hdr_t));
            total_len += sizeof(rle_hdr_t) + bm_size;
            src += sizeof(rle_hdr_t);
            dst += sizeof(rle_hdr_t);
            len -= sizeof(rle_hdr_t);
            p = dst;
            while (bm_size)
            {
                if (*src > 0x7F)
                {
                    int i = 0x100 - *src++;
                    len--;
                    while (i--)
                    {
                        *dst++ = *src++;
                        bm_size--;
                        len--;
                    }
                }
                else
                {
                    int num = *src++, val = *src++;
                    len -= 2;
                    memset(dst, val, (size_t)++num);
                    dst += num;
                    bm_size -= num;
                }
            }
            for (uint32_t i = 0; i < bm_height - 1; i++)
            {
                for (uint32_t j = 0; j < bm_width; j++)
                {
                    p[((i + 1)*bm_width) + j] ^= p[(i*bm_width) + j];
                }
            }
        }
        return total_len;
    }

    После, я довёл до ума первую часть функции и закончил программу распаковывающую все.IMH-ресурсы. Из 27-ми таких ресурсов получилось 108 BMP изображений.




    That's all for now. As I said at the beginning, I intend to continue. The next step I plan to finally deal with the graphics ( .PIC) and go to the text. Perhaps in the near future I will post my developments on github . And let's see where it leads me. I hope not in court :)


    The reverse of the Neuromancer. Part 2: Render Font

    Also popular now: