Invisibility file riddle

    image

    Not so long ago, having finished work on another article for Habr, I decided to send it to my friend for review. Having saved the HTML page with the whole environment (pictures, styles etc), I packed it into a ZIP archive and sent it to the recipient. Five minutes later I received a feedback, which, contrary to my expectations, was connected not with the article itself, but with the fact that the archive was completely empty. After scratching my head and deciding that I was dumb with archiving, I repeated the procedure, making sure that I selected all the files necessary for packing. A few minutes later, an acquaintance again burst out in surprise, “Are you kidding?”, While I was not joking at all.

    I began to put together all the elements of the puzzle. Firstly, I found out why he is trying to open the archive. Suddenly, as a viewer, he uses some third-rate garbage, don’t understand from which developer? However, it turned out to be the default explorer.exe . I used Total Commander both for packing and for viewing the resulting archive, and in my case it was not empty at all:

    image

    Well, did this assembly xxxWindowsUltimateEditionxxx let us down? I tried to open the same archive on my computer using explorer.exe and finally believed my friend - the archive really looked empty:

    image

    Who is to blame for this behavior? Let's figure it out.

    How did the process proceed, and what came of it, read under the cut (carefully,many screenshots ). Before reading this article, I also strongly recommend that you familiarize yourself with the previous ones .

    Experimentally, I found that the problem is reproduced at least if there is a symbol ““ ”in the name of the archived file (for example,“ “some_file.txt”). Then I found out that when using 7-Zip as an archiver, the contents of the resulting archive are quietly displayed in explorer.exe. Checking the “problem” archive for errors with the built-in 7-Zip tools also did not reveal anything:

    image

    By the way, did you notice that instead of the characters “'” and “'” in the original archive Total Commander shows underscore character ('_')? 7-Zip File Manager, in turn, replaced '' 'with character'

    image

    What is it? Let's not guess and see how the ZIP archives differ among themselves when one was packed with the built-in Total Commander archiver, and the second with 7-Zip.

    To begin with, we create a minimal example that reproduces the problematic situation - I settled on the empty file "some_file.txt" (the final archive will be called "some_file.zip"). Next, we archive it in both ways without compression and take XVI32 into our hands , in which we open both resulting archives:

    Problem archive
    image

    Normal archive
    image

    It is noticeable with the naked eye that their contents are different from each other. However, this should not cause any specific emotions so far, because archivers may well write information about their loved ones in some “additional” fields. To see exactly what the difference between these files is, let's look at the specification and a third-party description of the ZIP format and “break” our bytes into components:

    Problem archive
    Local file header
    50 4B 03 04 - signature
    14 00 - PKZip version needed to extract
    02 00 - General purpose bit flag
    00 00 - Compression
    83 55 - Mod. time
    DC 46 - Mod. date
    00 00 00 00 - CRC-32 checksum
    00 00 00 00 - Compressed size
    00 00 00 00 - Uncompressed size
    0E 00 - File name len
    00 00 - Extra field len
    3C 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("

    Normal archive
    Local file header
    50 4B 03 04 - signature
    0A 00 - PKZip version needed to extract
    00 08 - General purpose bit flag
    00 00 - Compression
    84 55 - Mod. time
    DC 46 - Mod. date
    00 00 00 00 - CRC-32 checksum
    00 00 00 00 - Compressed size
    00 00 00 00 - Uncompressed size
    0F 00 - File name len
    00 00 - Extra field len
    C2 AB 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("B" some_file.txt ")
    Central directory file header
    50 4B 01 02 - Signature
    3F 00 - Version
    0A 00 - PRZip version needed to extract
    00 08 - Flags
    00 00 - Compression
    84 55 - Mod. time
    DC 46 - Mod. date
    00 00 00 00 - CRC-32 checksum
    00 00 00 00 - Compressed size
    00 00 00 00 - Uncompressed size
    0F 00 - File name len
    24 00 - Extra field len
    00 00 - File comm. len
    00 00 - Number of the disk on which this file exists
    00 00 - Internal attr.
    20 00 00 00 - External attr.
    00 00 00 00 - Offset of local header
    C2 AB 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("B" some_file.txt ")
    0A 00 20 00 00 00 00 00 00 01 00 18 00 F0 88 3F D4 6D B1 D0 01 F0 88 3F D4 6D B1 D0 01 F0 88 3F D4 6D B1 D0 01 - Extra field
    End of central directory record
    50 4B 05 06 - Signature
    00 00 - Number of this disk
    00 00 - Number of the disk on which the central directory starts
    01 00 - Number of central directory entries on this disk
    01 00 - Total number of entries in the central directory
    61 00 00 00 - Central directory size
    2D 00 00 00 - Offset of cd wrt to starting disk
    00 00 - Comment len
    


    As you can see, in the case of a problem archive the symbol ““ ”really for some reason“ turned ”into '<' (0x3C), while in the normal archive it continues to be itself (0xC2 0xAB - this is how it is presented in UTF -8).

    And what happens if we simply replace the '<' in the problem archive with '' ', of course, simultaneously changing the values ​​of the remaining bytes, which we could influence in this way? Replace 0x3C with 0xC2 0xAB (note that this must be done in two places at once), 0x0E 0x00 (File name len) with 0x0F 0x00 (this also needs to be done in two places), 0x3C 0x00 0x00 0x00 (Central directory size) with 0x3D 0x00 0x00 0x00 (since we previously increased the size of the Central directory) and 0x2C 0x00 0x00 0x00 (Offset of cd wrt to strating disk) by 0x2D 0x00 0x00 0x00. The result should be the following:

    image

    Open the resulting archive in explorer.exe and see:

    image

    Yes, the file is now visible, but there is clearly something wrong with its name. We scour the specification in search of the word "unicode" and meet the following:

    APPENDIX D - Language Encoding (EFS)

    D.1 The ZIP format has historically supported only the original IBM PC character
    encoding set, commonly referred to as IBM Code Page 437. This limits storing
    file name characters to only those within the original MS-DOS range of values
    and does not properly support file names in other character encodings, or
    languages. To address this limitation, this specification will support the
    following change.

    D.2 If general purpose bit 11 is unset, the file name and comment should conform
    to the original ZIP character encoding. If general purpose bit 11 is set, the
    filename and comment must support The Unicode Standard, Version 4.1.0 or
    greater using the character encoding form defined by the UTF-8 storage
    specification. The Unicode Standard is published by the The Unicode
    Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
    is expected to not include a byte order mark (BOM)

    Let's see if 11 bits of the flag field are set in our cases:

    Problem archive
    image

    Normal archive
    image

    Let's set this flag in case of a problem file (Tools -> Bit manipulation)

    image

    and try to open our archive again in explorer.exe:

    image

    Exactly, we forgot that the fields There are actually two flags:

    image

    Set the necessary bit in this field and open our archive again:

    image

    Great! That's just why Total Commander "turns" the character "" 'into' <'? In order to understand this, we will take OllyDbg into our hands and run the file manager we are exploring in it. Although, wait, let's check if ASLR technology is enabledfor totalcmd.exe. We load it into PE Tools using Alt-1, click on the “Optional Header” button and see that the database will not change (for a more detailed description of this process, I recommend that you look at the previous article ):

    image

    Obviously, to create the TC archive, you should use CreateFile WinAPI-function , so put the breaks on its calls (most likely, in our case, it should use the Unicode version of this function):

    image

    Remove the breaks that work on every action we do not need (for example, on the event that the TC window receives focus - at 0x00567264 ):

    image

    Click Alt-F5 (a key combination for archiving files in Total Commander), click on “OK” and find ourselves here:

    image

    Let's try to understand if TC has already converted the character '' 'to' <'. To do this, open the "Memory" window with Alt-M -> left-click on the first line -> Ctrl-B -> enter "
    image

    We click on the “OK” button and we see that the application has already completed its conversion:

    image

    Press Alt-K to look at the Call Stack:

    image

    Put the breaks at the beginning of each procedure in the list, press F9, delete the resulting archive and make sure that there is no longer line in process memory
    image

    Again, we look for the same line in the entire memory of the process and ... We find it:

    image

    Well, the last logical option at the moment is to put the break at the beginning of the last procedure in the Call Stack, from which, in fact, they called us here:

    image

    Jump to call (right-click on the line with the address of the current procedure in the Call Stack window -> Show Call), run up to the start of the procedure prompted by OllyDbg and put the break on it:

    image

    Do the same actions as before (press F9 hard, delete archive, check the process memory for the absence of a string "
    image

    Given that the Call Stack is currently empty, we can assume that we are executing in a thread other than the main thread, or we got here as a result of a conditional or unconditional jump. Press Ctrl-R and see:

    image

    We jump to the only link by pressing Enter:

    image

    We look who in turn refers to this line:

    image

    We jump there:

    image

    We go inside several procedures and we see a call to the CreateThread WinAPI function :

    image

    In principle, you can verify this it would be another way - just look at the title bar of the CPU window, which in my case reported that the thread ID is 0x000013B8:

    image

    At the same time, in the “Log” window, opened by pressing Alt-L, you can see that the ID of the main thread equal to 0x00001E30:

    image

    Press Alt-F5, put the breaks on the CreateThread calls

    image

    , click on the “OK” button and stop at the place we already know:

    image

    Look at the Call Stack and use the “binary search” method (divide the number of input parameters in half and look at the result), untwist the chain calls of various procedures to the state when it becomes known after calling which one in the process memory the line "0x00491780. Looking carefully at what is happening inside it, we can find a call to the CharToOem WinAPI function :

    image

    According to official documentation, this function translates the passed string into an OEM-defined character set , and if the ANSI version is used, we can do the so-called “In place translation” (src and dest can point to the same address, which eliminates the need to create a separate buffer for the final line), which happens in the case of TC:

    lpszDst [out]
    Type: LPSTR
    The destination buffer, which receives the translated string. If the CharToOem function is being used as an ANSI function, the string can be translated in place by setting the lpszDst parameter to the same address as the lpszSrc parameter. This cannot be done if CharToOem is being used as a wide-character function

    Yes, after her call, the buffer passed to her already really contains the character '<' instead of '“::

    image

    What, will we patch? And let's first look at the archiving options in TC:

    image

    By default, the option “Pack Unicode names” is set to “Ask every time a Unicode name is encountered”. Therefore, TC did not consider that the name encountered was Unicode. And if you try to archive the file, for example, with Chinese characters?

    image

    As you can see, in this case, TC displays a window with a message about the encountered file name, which contains characters other than the used code page. In the case of the '' symbol, there was no such message, most likely because many code pages (and in my case, apparently, CP1251) contain this symbol in the additional “cells” allocated to them. But if you set this option to “All as UTF-8 if at least one contains characters> 127”, then we will see that our file “some_file.txt” is correctly packed and subsequently displayed in explorer.exe.

    You may ask, “Why, even if the file name has changed from“ some_file.txt ”to“prohibited for use in directory and file names in case of NTFS:

    The following reserved characters:

    <(less than)
    > (greater than)
    : (colon)
    "(double quote)
    / (forward slash)
    \ (backslash)
    | (vertical bar or pipe)
    ? (Question mark)
    * (asterisk)

    Moreover, it is also impossible to unzip such an archive using the built-in Windows tools. Firstly, because of the '' symbol in the name of the archive itself:

    image

    Secondly, because of its contents:

    image

    Afterword


    In absolutely any product there are bugs / features (call it what you want) that can get out in the most unexpected places, and the more complex the software package, the more bugs can be detected in it. Do not be lazy to study the causes of your behavior, because it is quite possible that in the process of researching the application you will learn something new for yourself.

    Thank you for your attention, and again I hope that the article was useful to someone.

    Also popular now: