Methods for detecting "glued" files



    Many could hear about files like rarjpeg. This is a special kind of file, which is a jpeg image and a rar archive glued together. It is an excellent container for hiding the fact of information transfer. You can create rarjpeg using the following commands:

    UNIX: cat image1.jpg archive.rar> image2.jpg
    WINDOWS: copy / b image1.jpg + archive.rar image2.jpg

    Or, if you have a hex editor.

    Of course, to hide the fact of the transfer of information, you can use not only the JPEG format, but also many others. Each format has its own characteristics, thanks to which it can be suitable or not for the role of the container. I will describe how you can find glued files in the most popular formats or point to the fact of gluing.

    Methods for detecting glued files can be divided into three groups:

    1. Method for checking the area after the EOF marker. Many popular file formats have the so-called end-of-file marker, which is responsible for displaying the desired data. For example, photo viewers read all bytes up to this marker, however, the area after it remains ignored. This method is ideal for formats: JPEG, PNG, GIF, ZIP, RAR, PDF.
    2. Method for checking file size. The structure of some formats (audio and video containers) allows you to calculate the actual file size and compare it with the original size. Formats: AVI, WAV, MP4, MOV.
    3. Method for checking CFB files. CFB or Compound File Binary Format - a document format developed by Microsoft, which is a container with its own file system. This method is based on the detection of anomalies in the file.

    Is there life after the end of the file?


    Jpeg


    To find the answer to this question, it is necessary to delve into the specification of the format, which is the "ancestor" of the glued files and understand its structure. Any JPEG starts with a signature of 0xFF 0xD8.

    After this signature there is service information, optionally an image icon and, finally, the compressed image itself. In this format, the end of the image is marked with a double-byte signature 0xFF 0xD9.

    PNG


    The first eight bytes of the PNG file are the following signature: 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A. The end signature that ends the data stream: 0x49, 0x45, 0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82.

    Rar


    Common signature for all rar archives: 0x52 0x61 0x72 0x21 (Rar!). After it comes information about the archive version and other related data. It was experimentally established that the archive ends with the signature 0x0A, 0x25, 0x25, 0x45, 0x4F, 0x46.

    Table of formats and their signatures:
    FormatInitial SignatureEnd signature
    Jpeg0xFF 0xD80xFF 0xD9
    PNG0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A0x49 0x45 0x4E 0x44 0xAE 0x42 0x60 0x82
    Rar0x52 0x61 0x72 0x210x0A 0x25 0x25 0x45 0x4F 0x46
    The gluing check algorithm in these formats is extremely simple:

    1. Find the initial signature;
    2. Find the ultimate signature;
    3. If there is no data after the final signature, your file is clean and contains no attachments! Otherwise, it is necessary to search for other formats after the final signature.

    GIF and PDF

    FormatInitial SignatureEnd signature
    GIF0x47 0x49 0x46 0x380x00 0x3B
    Pdf0x25 0x50 0x44 0x460x0A 0x25 0x25 0x45 0x4F 0x46
    A PDF document may have more than one EOF marker, for example, due to incorrect document generation. The number of final signatures in the GIF file is equal to the number of frames in it. Based on the features of these formats, you can improve the algorithm for checking the presence of glued files.

    1. Point 1 is repeated from the previous algorithm.
    2. Point 2 is repeated from the previous algorithm.
    3. When finding the final signature, remember its location and search further;
    4. If you reach the last EOF token in this way, the file is clean.
    5. If the file does not end with the final signature - goto is the place of the last found final signature.

    The big difference between the file size and the position after the last ending signature indicates the presence of a glued attachment. The difference may be more than ten bytes, although other values ​​may be set.

    ZIP


    The peculiarity of ZIP archives is the presence of three different signatures:
    SignaturesDescription
    0x50 0x4B 0x03 0x04Normal archive signature
    0x50 0x4B 0x05 0x06Signature of an empty archive
    0x50 0x4B 0x07 0x08Partitioned archive signature
    The structure of the archive is as follows:
    Local File Header 1
    File data 1
    Data Descriptor 1
    Local File Header 2
    File data 2
    Data Descriptor 2
    ...
    Local File Header n
    File data n
    Data descriptor n
    Archive decryption header
    Archive extra data record
    Central directory
    Most interesting is the central directory, which contains metadata about files in the archive. The central directory always starts with the signature 0x50 0x4b 0x01 0x02 and ends with the signature 0x50 0x4b 0x05 0x06, followed by 18 bytes of metadata. Interestingly, empty archives consist only of a final signature and 18 zero bytes. After 18 bytes, there is an archive comment area, which is an ideal container for hiding a file.

    To check the ZIP archive, you need to find the final signature of the central directory, skip 18 bytes and look for signatures of known formats in the comment area. The large size of the comment also indicates the fact of gluing.

    Size matters


    Avi


    The structure of the AVI file is as follows: each file begins with a RIFF signature (0x52 0x49 0x46 0x46). On 8 bytes there is an AVI signature specifying format (0x41 0x56 0x49 0x20). A block at offset 4, consisting of 4 bytes, contains the initial size of the data block (byte order - little endian). To find out the number of the block containing the next size, you must add the size of the header (8 bytes) and the size obtained in the block 4-8 ​​bytes. Thus, the full file size is calculated. It is assumed that the calculated size may be smaller than the actual file size. After the calculated size, the file will contain only zero bytes (necessary for alignment of the border of 1 KB).

    Size calculation example:


    BiasThe sizeNext offset
    4314428 + 31442 = 31450

    Wav


    Like AVI, a WAV file starts with a RIFF signature, however, this file has a signature of 8 bytes - WAVE (0x57 0x41 0x56 0x45). File size is calculated in the same way as AVI. The actual size should be exactly the same as calculated.

    Mp4


    MP4 or MPEG-4 - a media container format used to store video and audio streams, also provides for the storage of subtitles and images.
    At an offset of 4 bytes, the signatures are located: the file type ftyp (66 74 79 70) (QuickTime Container File Type) and the file subtype mmp4 (6D 6D 70 34). To recognize hidden files, we are interested in the ability to calculate the file size.



    Consider an example. The size of the first block is at zero offset, and it is 28 (00 00 00 1C, Big Endian byte order); it also indicates the offset where the size of the second data block is located. At the 28th offset, we find the next block size equal to 8 (00 00 00 08). To find the next block size, you must add the sizes of the previous blocks found. Thus, the file size is calculated:
    BiasValueNext offset
    02828 + 0 = 28
    28828 + 8 = 36
    3630373936 + 303739 = 303775
    3037756202303775 + 6202 = 309977

    Mov


    This widely used format is also an MPEG-4 container. MOV uses a proprietary data compression algorithm, has a structure similar to MP4 and is used for the same purposes - to store audio and video data, as well as related materials.
    Like MP4, any mov file has a 4-byte signature ftyp at 4 offsets, however, the following signature has a value of qt__ (71 74 20 20). The rule for calculating the file size has not changed: starting from the beginning of the file, we calculate the size of the next block and add it.

    The method of checking this group of formats for the presence of “glued” files consists in calculating the size according to the rules given above and comparing it with the size of the checked file. If the current file size is much smaller than calculated, then this indicates the fact of gluing. When checking AVI files, it is assumed that the calculated size may be smaller than the file size due to the presence of added zeros to align the border. In this case, it is necessary to check the zeros after the calculated file size.

    Checking Compound File Binary Format


    This file format, developed by Microsoft, is also known as OLE (Object Linking and Embedding) or COM (Component Object Model). DOC, XLS, PPT files belong to the group of CFB formats.

    A CFB file consists of a 512-byte header and sectors of the same length that store data streams or service information. Each sector has its own non-negative number, with the exception of special numbers: “-1” - numbers the free sector, “-2” - numbers the sector that closes the chain. All sector chains are defined in the FAT table.



    Suppose that an attacker modified a certain doc file and pasted another file at its end. There are several different ways to detect it or point to an anomaly in a document.

    Abnormal file size


    As mentioned above, any CFB file consists of a header and sectors of equal length. To find out the size of a sector, it is necessary to read a two-byte number at 30 offset from the beginning of the file and raise 2 to the power of this number. This number must be equal to either 9 (0x0009) or 12 (0x000C), respectively, the file sector size is 512 or 4096 bytes. After finding the sector, it is necessary to check the following equality:

    (FileSize - 512) mod SectorSize = 0

    If this equality does not hold, then you can indicate the fact of file gluing. However, this method has a significant drawback. If the attacker knows the size of the sector, then he just needs to stick his file and another n bytes so that the size of the glued data is a multiple of the size of the sector.

    Unknown sector type


    If an attacker knows about a method of circumventing a previous check, then this method can detect the presence of sectors with undefined types.

    Define the equality:

    FileSize = 512 + CountReal * SectorSize, where FileSize is the file size, SectorSize is the sector size, CountReal is the number of sectors.

    We also define the following variables:

    1. CountFat - the number of FAT sectors. It is located at the 44th offset from the beginning of the file (4 bytes);
    2. CountMiniFAT - the number of sectors MiniFAT. It is located at 64 offset from the beginning of the file (4 bytes);
    3. CountDIFAT - the number of DIFAT sectors. It is located at 72 offset from the beginning of the file (4 bytes);
    4. CountDE - The number of Directory Entry sectors. To find this variable, you must find the first sector DE, which is located at 48 bias. Then you need to get a complete view of DE from the FAT and calculate the number of DE sectors;
    5. CountStreams - the number of sectors with datastreams;
    6. CountFree - the number of free sectors;
    7. CountClassified - the number of sectors with a specific type;

    CountClassified = CountFAT + CountMiniFAT + CountDIFAT + CountDE + CountStreams + CountFree

    Obviously, with the inequality of CountClassified and CountReal, we can conclude that files can be glued together.

    Used sources:


    Analysis of MP4 structure
    Analysis of AVI structure
    Analysis of MOV structure
    Analysis of WAV structure
    O-checker: Detection of Malicious Documents through Deviation from File Format Specifications
    GIF
    format specifications PDF format specifications
    Wikipedia article About JPEG
    structure analysis

    Also popular now: