Processing large packaged files on a Mac and more

  • Tutorial
Somehow I had a task to process the log file. In principle, the task is commonplace, I use Perl for this on both Linux and Windows. But the fact is that all this happens on a Mac, the file is in the archive and it is large. Unpacked, it takes about 20 GB.
What will be the usual solution?

If the file was small, then you can just get it from the archive and submit it to the script input. But this is not so, and it is a pity to spend disk space. To do this, there is a standard solution to unzip a file in STDOUT and immediately pick it up with a handler from STDIN (via an unnamed pipe, the symbol "|"). No sooner said than done. A standard Mac unpacker has options for this.
unzip -p data.zip log.txt | process.pl > result.txt

Where, process.pl is a log handler.
After testing on small files, everything was debugged and I switched to a working file. But then a surprise awaited me. The file was processed instantly, but the result was empty. It turned out that files larger than 4 GB are not unpacked. Haha, and this is in a 64-bit OS. After googling it turned out that yes, there is such a problem. They even say that the file can be packed, but not extracted. Some of the programs that were offered, according to the description, were good, for example, The Unarchiver (http://wakaba.c3.cx/s/apps/unarchiver.html), but had only a graphical interface, well, yes, of course, this is a Mac. Fortunately, there was another utility, unar (http://code.google.com/p/theunarchiver/downloads/list) from the same author who can work with the command line. Everything is cool, but ... she can unpack only to a file, and then only with the original name. And what to do? I already decided to look for something else, but in time I remembered about named pipes (named pipe), which allows you to make a pseudo-file on the disk, which acts as a pipeline, where one program writes, the other reads and both believe that they work with the present file. That is, the action plan was as follows:

1. Create a named pipe with a name that matches the packed file:
mkfifo log.txt

2. Run the handler, which will read the data from it. Run it with the & symbol so that it works in the background, otherwise it will wait for the data and will not release the terminal until it completes the full processing:
./process.pl results.txt &

3. Now you can start unpacking:
./unar -D -f data.zip log.txt

The -D option does not create a directory.
-f ignore if a file with the same name already exists.
4. After finishing work, delete the named pipe:
unlink log.txt

Everything is fine, everything works. Naturally, all of the above can be used in regular Linux.

Also popular now: