CPIO under the microscope

This archive allows you to collect any number of files, directories and other filesystem objects (symbolic links, etc.) into a single stream of bytes.
Let's look at examples of the format of this archive.
Each file system object in such an archive consists of a header with basic metadata, followed by the full path to the object and the contents of this object. The header contains a set of integer values that largely repeat the fields of the stat (2) structurefile in * nix systems. The end of the archive is marked with a special entry (similar to the others) with the name 'TRAILER !!!'.
File format.
At the moment, the most common is the old CPIO file entry format. It is his description that will be given.
The recording format header has the following structure:
struct header_old_cpio {
unsigned short c_magic;
unsigned short c_dev;
unsigned short c_ino;
unsigned short c_mode;
unsigned short c_uid;
unsigned short c_gid;
unsigned short c_nlink;
unsigned short c_rdev;
unsigned short c_mtime[2];
unsigned short c_namesize;
unsigned short c_filesize[2];
};
It is assumed here that the unsigned short type is 16 bits in size.
c_magic
An integer value equal to 070707 (in octal CC), or 0x71c7 (in hexadecimal CC). Used to determine byte order (little-endian vs big-endian).
c_dev , c_ino
Device and inode numbers from the disk. Match the values in the stat structure. If the inode value is greater than 65535, then the most significant bits will be lost.
c_mode The
field simultaneously defines the access rights and the type of object:
0170000 | Masks file type bits |
0140000 | Socket |
0120000 | Symbolic link. For symbolic links, the link body will contain the path to the file to which it refers. |
0 100 000 | Regular file |
0060000 | Special block device |
0040000 | Catalog |
0020000 | Special character device |
0010000 | Named pipe (named pipe) or queue (FIFO). |
0004000 | SUID |
0002000 | SGID |
0001000 | Sticky bit. |
0000777 | The lower 9 bits determine the access rights to the object. |
c_uid , c_gid
User and group identifiers of the file owner.
c_nlink The
number of links to this file. For directories, the value of this field is always at least two.
c_rdev
Only for special character and block devices. The field contains the
associated device number. For all other file types, the value of
this field must be zero.
c_mtime The
time the file was last modified. The format corresponds to the number of seconds
elapsed since the beginning of the UNIX era. A 32-bit integer is written as an array of two
16-bit integers: first the most significant bits, then the lower ones.
c_namesize Line
length of the full path to the file including terminal NULL.
c_filesize
file size.
Immediately after the header is the full path to the object. If the path line length is not a multiple of the power of two, then another NULL is added to the end. Then the contents of the file are placed. If the size of the content is not a multiple of the power of two, then it is padded with zeros.
Example archive.
Now let's take a microscope. I'll take Bless as a microscope . I will not say that I really like this hex editor, but I forgot the name of the one I like.
Create a simple directory:
cpio_test
|
+ test.txt
|
+ testl.txt
Here testl.txt is a symbolic link to the test.txt file.
The contents of the test.txt file:
Simple example of cpio usage.
Then create an archive:
$ find cpio_test | cpio -ov > example.cpio
and open the resulting archive in your favorite hex editor.
This archive looks like this for me:
0000 | C7 71 09 08 9A 34 FD 41 F4 01 F4 01 02 00 00 00 | .q...4.A........
0010 | 8C 4E 09 31 0A 00 00 00 00 00 63 70 69 6F 5F 74 | .N.1......cpio_t
0020 | 65 73 74 00 C7 71 09 08 A2 34 B4 81 F4 01 F4 01 | est..q...4......
0030 | 01 00 00 00 8C 4E 09 31 13 00 00 00 1E 00 63 70 | .....N.1......cp
0040 | 69 6F 5F 74 65 73 74 2F 74 65 73 74 2E 74 78 74 | io_test/test.txt
0050 | 00 00 53 69 6D 70 6C 65 20 65 78 61 6D 70 6C 65 | ..Simple example
0060 | 20 6F 66 20 63 70 69 6F 20 75 73 61 67 65 2E 0A | of cpio usage..
0070 | C7 71 09 08 9C 34 FF A1 F4 01 F4 01 01 00 00 00 | .q...4..........
0080 | 8C 4E 1A 2F 14 00 00 00 08 00 63 70 69 6F 5F 74 | .N./......cpio_t
0090 | 65 73 74 2F 74 65 73 74 6C 2E 74 78 74 00 74 65 | est/testl.txt.te
00A0 | 73 74 2E 74 78 74 C7 71 00 00 00 00 00 00 00 00 | st.txt.q........
00B0 | 00 00 01 00 00 00 00 00 00 00 0B 00 00 00 00 00 | ................
00C0 | 54 52 41 49 4C 45 52 21 21 21 00 00 00 00 00 00 | TRAILER!!!......
Well then, let's figure it out.
0x71c7 = 070707 - the beginning of the header. And we can already say that the byte order when creating the archive is little-endian.
0x0809 is c_dev - the device number on which the file is located.
0x349a is c_ino - inode. In this case, just the senior ranks were lost.
0x41fd = 0040775 - c_mode. That is, the header describes a directory with access rights 0775.
0x01f4 = 500 - c_uid.
0x01f4 = 500 - c_gid.
0x0002 - c_nlink. Each directory has at least two links (. And ..)
0x0000 - c_rdev.
0x4e8c and 0x3109 are the high and low bits of the 32-bit file modification time value. 0x31094e8c = 1317810441.
0x000a - the length of the directory name.
0x00000000 - the directory does not have a body.
Next is the name of the directory.
Then immediately follows the title of the next record. We will not dwell on it in detail - just notice some differences:
c_mode: 0x34a2 = 0100664 - shows that this is a regular file with 664 permissions.
0x0000001e is the size of the file content.
The rest of the entry is not similar to the directory description.
Next is a record of the symbolic link. The contents of a symbolic link is the name of the file that it points to. Otherwise, the header with metadata and the path to the file are similar to the structures for a regular file.
This is how the CPIO archive is created in such a cunning way. In the future, I would like to consider in a similar manner the format of the file created by Gzip. In particular, the cfio + gzip bundle creates the ramfs used by the GNU / Linux kernel.
Hope the article will be helpful.
Related links:
CPIO utility
description CPIO format description