aruseni April 26, 2013 at 18:52

Sponge command: sponge for standard input

We all know that when executing commands in the shell, we can redirect standard output to standard input of other commands, as well as write it to a file.

This is described in some detail in Chapter I / O Redirection in "Advanced Programming Guide Bash» ( Advanced Bash-Scripting Guide ).

In particular, sometimes it happens that you need to read a file, process it somehow (for example, select only those lines that fit a certain regular expression), and then write the result to the same file. Suppose your file is called “messages.log”, and you want to leave in it only those lines that begin with the word “Success”, a colon and a space (and remove all other lines).

We can assume that this command is suitable for this:

grep "^Success:\s" messages.log > messages.log

But this assumption will turn out to be wrong - when this line is executed, the messages.log file will be opened for writing and cleared before grep starts looking at it.

However, the interesting thing is that when grep is still running, it will find that the output is redirected to the same file that it is trying to read, and will immediately end with the following message:

grep: input file 'messages.log' is also the output

GNU cat does the same thing (try running cat messages.log> messages.log):

cat: messages.log: input file is output file

This is done by comparing the device and inode for the input file with the corresponding values for the file used to write standard output. See the implementation of this approach can be in the src / cat.c .

By the way, BSD cat does not provide such checks, but in this case it is not so important: the file is already cleared in one way or another, so there is nothing to read and write, so the cat will just end.

However, take another example:

cat messages.log >> messages.log

In this case, we do not clear messages.log, but append the output of the cat command to the end of the file. And if cat checks that the two files match and completes, then the file will remain in the same state and the user will see an error. But if there is no such check, then cat will go into the loop and will supplement the file until the place runs out or the user completes the process.

Now let's think about how you can still write the output to the same file that we are reading. The obvious solution is to use a temporary file. I.e:

mv messages.log tmpmessages.log
grep "^Success:\s" tmpmessages.log > messages.log
rm tmpmessages.log

This is not to say that it is very convenient, but at least the task is thus completely solved.

Another option is we can use sed.

sed -i -n -e '/^Success:\s/{p}' messages.log

But this solution, of course, is not too universal - after all, the choice of lines matching in regular expression is only one of many tasks associated with text processing. In addition, the syntax in this case is already much more complicated.

By the way, in fact sed also uses a temporary file - you can verify this by looking at the output of strace:

open ("messages.log", O_RDONLY) = 3
...
open ("./ sedWiaEAG", O_RDWR | O_CREAT | O_EXCL, 0600) = 4
...
read (3, "Success: 123 \ nError: 123 \ n", 4096) = 24
write (4, "Success: 123 \ n", 13) = 13
read (3, "", 4096) = 0
...
close (3) = 0
...
close (4) = 0
...
rename ("./ sedWiaEAG", "messages.log") = 0
close (1) = 0
close (2) = 0
exit_group (0) =?

Obviously, you need to be able to somehow do without intermediate files at all. And there is such an opportunity - this is the sponge program from moreutils.

sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constructing pipelines that read from and write to the same file.

sponge reads standard input and writes it to the specified file. Unlike shell redirects, sponge “absorbs” all input passed before opening the file to which it is written. This allows you to use such pipelines, where the reading comes from the same file in which the write.

So, using sponge, we can remove the shell redirection from our example, and, instead, pass the name of the file to which we want to write the result, as an argument to the sponge command. We pass the output of the grep command using the pipeline.

grep "^Success:\s" messages.log | sponge messages.log

In principle, the entire blog post could be reduced to this example, but, I think, it turned out to be more interesting, and perhaps even managed to talk about some nuances that some of the readers did not know before.

I wish you all a great Friday!

Tags:

Sponge command: sponge for standard input

Also popular now: