How I found a bug in GNU Tar

Original author: Chris Siebenmann
  • Transfer
The author of the article is Chris Siebenmann , Unix system administrator at the University of Toronto.

From time to time something strange happens in my work that makes me think. Even if it is not immediately clear what the conclusions follow. Recently, I mentioned that we found a bug in GNU Tar, and the story of how this happened is one such case.

For backup file servers, we use Amanda and GNU Tar. For a long time, we occasionally had a rather rare problem when tar went crazy when backing up the file system with a directory/var/mail, producing a huge amount of output. Usually this process went to infinity and had to kill the dump; in other cases, it did end up with a terabyte (s) of data that seemed to be perfectly compressed. When I once again got such a giant tar file, I checked it - and found out that it partially consists of zero bytes, which the testing team doesn’t like very much tar -t, after which everything returns to normal.

(Because of this, I wondered if people in mailboxes naturally appear in zero bytes. It turned out that searching for zero bytes in text files is not so simple and yes, they are there).

We recently moved the file system from /var/mailto new Linux file servers under Ubuntu 18.04 and therefore switched to a later and more standard version of GNU Tar than is on OmniOS machines. We hoped that this would solve our problems, but almost immediately the same incident occurred. This time GNU Tar worked on the Ubuntu machine, where I am well acquainted with all the available debugging tools, so I checked the running process tar. The test showed that it tarproduces an infinite stream read(), returning 0 bytes:

read(6, "", 512)   = 0
read(6, "", 512)   = 0
[...]
read(6, "", 512)   = 0
write(1, "\0\0\0\0\0"..., 10240) = 10240
read(6, "", 512)   = 0
[...]

lsofsaid file descriptor 6 is someone's mailbox.

With the help, apt-get source tarI downloaded the source code and started looking for system calls in it read()that do not check for file completion. Having examined several levels of indirect addressing, I found an obvious place where such a check seems to be omitted, namely, in the function sparse_dump_regionfrom the sparse.cs file . And then I remembered something.

A few months ago, we ran into an NFS problem in Alpine . While working on this bug, I traced the Alpine process and noticed, among other things, that he used to resize mailboxesftruncate(); sometimes it expands them, temporarily creating a sparse section of the file, until it fills it, and perhaps it sometimes compresses it. This seemed to coincide with the current situation: the sparse areas are connected, and reducing the file size with the help ftruncate()creates a situation where tar unexpectedly encounters the completion of the file.

(This even explains why tar is sometimes restored; if later a new mail suddenly arrives in the box, it returns to the expected size and tar no longer faces an unexpected completion of the file).

I fiddled a bit with GDB on the Ubuntu debugging symbols and the tar source code I received, and was able to reproduce the error, although it was somewhat different from my original theory. It turned out thatsparse_dump_regiondoes not reset sparse areas of the file, but resets not sparse (well, of course), and is used for all files (sparse or not) if you run tar with an argument --sparse. Thus, the actual error is that if you run GNU Tar with an argument --sparseand the file is compressed during its reading, tar cannot correctly handle the end of the file received earlier than expected . If the file grows again, tar will restore.

(Except when the file is sparse only at the end and is compressed only in this place. In this case, everything is fine).

I thought that all the same I could check many years ago on our OmniOS file servers. There are ways to trace the system calls of the program and analogueslsof, and I could find and see the source code of my version of GNU Tar and run it with the OmniOS debugger (although GDB is not installed there), and so on. But I did not. Instead, we shrugged and moved on. I had to move the file system under Ubuntu so that I could lift a finger and figure out the problem.

(It's not just about the tools and the environment; we also automatically assumed that some old unsupported version of GNU Tar was on OmniOS, which makes no sense to investigate, because the problem was of course decided in a newer version).

PS: Probably, as a quick fix, we simply forbid Amanda from using the tar parameter --sparsewhen backing up. Mailboxes should not be sparse, and if this happens,we still compress file system backups , so all these zero bytes will compress well.

PPS: I did not try to report a bug to the developers of GNU Tar, because I discovered it only on Friday, and the university is now on winter holidays. Feel free to do this before me.

Also popular now: