Linux basics from the founder of Gentoo. Part 2 (4/5): Text Processing and Redirection

Original author: Daniel Robbins, Chris Houser, Aron Griffis
  • Transfer
In this passage, you will learn about many interesting and useful text data commands in Linux. The basics of working with input / output streams in bash are also given.



Linux basics navigation from the founder of Gentoo:

Part I
  1. BASH: Navigation Basics (Introduction)
  2. Manage files and directories
  3. Links and deleting files and directories
  4. Glob lookups (totals and links)

Part II
  1. Regular Expressions (Intro)
  2. Folder assignments, file search
  3. Process management
  4. Text Processing and Redirection
  5. Kernel modules (totals and links)



Word processing


Back to redirection


Earlier in this series of tutorials, we saw an example of using>, an operator to redirect the output of a command to a file, as shown below: In addition to redirecting the output to a file, we can use such a powerful shell feature as pipes (pipes). Using pipes, we can pass the output of one command to the input of another. Consider the following example: Symbol | used to connect the output of the command to the left, to the input of the command to its right. In the example above, the echo command prints to the output “hi there” with the line feed character at the end. This output usually appears in the terminal, but the channel redirects it to the input to the wc command, which shows the number of lines, words and characters.

$ echo "firstfile" > copyme



$ echo "hi there" | wc
1 2 9



An example with channels (pipes)


Here is another simple example: In this case, ls -s would normally print the current directory to the terminal, indicating the size in front of each file. However, instead, we pass the output to the sort -n program, which numerically sorts it. This is very convenient for finding files that take up the most space in a directory. The following examples are more complicated, they demonstrate the power and convenience that can be obtained using the channels. Next, we use teams that have not yet been reviewed, but do not focus on them. Instead, focus on understanding how pipes work and how you can use them in your daily work with Linux.

$ ls -s | sort -n





Unpacking channel


To unzip and unzip a file, you could do the following: The disadvantage of this method is the creation of an intermediate, unzipped file on disk. Since tar can read data directly from its input (instead of the specified file), we can get the same final result using the pipe: Wuuuuu! The compressed tarball was unpacked and we did without an intermediate file.

$ bzip2 -d linux-2.4.16.tar.bz2
$ tar xvf linux-2.4.16.tar




$ bzip2 -dc linux-2.4.16.tar.bz2 | tar xvf -



Authentic channel


Here is another pipe example: We use cat to send the contents of myfile.txt to the sort command. When sort receives input, it sorts them line by line in alphabetical order, and sends the uniq program in this form. uniq removes duplicate lines (by the way uniq, it requires a sorted input list) and sends the result to wc -l. We reviewed the wc command earlier, but without its options. When the -l option is specified, the command displays only the number of lines, the number of words and characters in this case are not displayed. You will see that such a pipe will print the number of unique lines in a text file. Try creating a couple of files in your text editor. Use this pipe on them and look at the result that you get.

$ cat myfile.txt | sort | uniq | wc -l





The storm of word processing begins!


Now we will start a quick look at Linux commands for standard word processing. Since we will now look at many programs, we will not have room for examples for each of them. Instead, we encourage you to read the man pages of the above commands (by typing man echo, for example) and examine each command with its options, spending some time playing with them. As a rule, these commands print the result of processing to the terminal, and do not directly modify the file. After this quick look, we will take a deeper look at I / O redirection. So yes, the light at the end of the tunnel is already visible. :)

echoprints its arguments to the terminal. Use the -e option if you want to include escape sequences in the output; for example, echo -e 'foo \ nfoo' will print foo, then go to a new line, then print foo again. Use the -n option to prevent echo from adding a newline to the end of the output, as is done by default.

cat will print the contents of the specified file to the terminal. It is convenient as the first pipe command, for example, cat foo.txt | blah.

sort will display the contents of the file specified on the command line in alphabetical order. Naturally, sort can also accept input from pipe. Type man sort to see the command options that control the sorting options.

uniqreceives an already sorted file or data stream (via pipe) and deletes duplicate lines.

wc displays the number of lines, words and characters in the specified file or in the input stream (from the pipe). Type man wc to learn how to configure the output of the program.

head prints the first ten lines of a file or stream. Use the -n option to specify how many lines should be displayed.

tail prints the last ten lines of a file or stream. Use the -n option to specify how many lines should be displayed.

tac is similar to cat , but prints all lines in reverse order, in other words, the last line is printed first.

expandconverts input tab characters to spaces. The -t option specifies the size of the tab.

unexpand converts input spaces to tab characters. The -t option specifies the size of the tab.

cut is used to extract from the input file or stream fields separated by the specified character. (try echo 'abc def ghi jkl' | cut -d '' -f2,2 approx. trans.)

The nl command adds its number to each input line. Convenient for printing.

pr parses the file into pages and numbers them; commonly used for printing.

tr - a tool for translating (converting) characters; used to map specific characters in the input stream to specified characters in the output stream.

sed- A powerful stream-oriented text editor. You can learn more about sed from the following guides on the Funtoo website:
If you plan to take the LPI exam, be sure to read the first two articles in this series.

awk is a sophisticated language for line-by-line parsing and processing of the input stream according to the given templates. To learn more about awk, read the following series of guides on the Funtoo website:
od is designed to represent input in octal, hexadecimal, etc. format.

split - this command is used to split large files into several smaller, more manageable parts.

fmt is used to “wrap” long lines of text. Today it is not very useful, since this feature is built into most text editors, although the team is good enough to know it.

paste takes two or more files as input, concatenates line by line, and prints the result. It may be convenient for creating tables or columns of text.

join is similar to paste, this utility allows you to combine two files by a common field (by default, the first field on each line).

tee prints the input arguments to the file and to the screen at the same time. This is useful when you want to create a log for something, and also want to see the process on the screen.

The storm is over! Redirection


Like> on the command line, you can use <to redirect the file, but already to the input of the command. For many commands, you can simply specify a file name. Unfortunately, some programs only work with standard input.

Bash and other shells support the concept of "herefile". This allows you to give input to the command in the form of a set of lines with a subsequent command, indicating the end of the input sequence of values. The easiest way to show this is with an example: In the above example, we enter the words apple, cranberry and banana, followed by "END" to indicate the end of the input. Then the sort program returns our words in alphabetical order.

$ sort <apple
cranberry
banana
END

apple
banana
cranberry




Using ">>"


You can expect >> will be somewhat similar to <<, but it is not. It allows you to simply add output to a file, rather than overwriting it every time>. Example: Oops! We lost a part with Hi! But what we had in mind: So it’s better! Thanks to Dmitry Minsky (Dmitry.Minsky@gmail.com) for the translation. To be continued ...

$ echo Hi > myfile
$ echo there. > myfile
$ cat myfile
there.




$ echo Hi > myfile
$ echo there. >> myfile
$ cat myfile
Hi
there.










About the authors


Daniel Robbins


Daniel Robbins is the founder of the Gentoo community and the creator of the Gentoo Linux operating system. Daniel lives in New Mexico with his wife Mary and two energetic daughters. He is also the founder and head of Funtoo , and has written numerous technical articles for IBM developerWorks , Intel Developer Services, and the C / C ++ Users Journal.

Chris houser


Chris Hauser has been a supporter of UNIX since 1994, when he joined the team of administrators at Taylor University (Indiana, USA), where he received a bachelor's degree in computer science and mathematics. He then worked in a variety of areas, including web applications, video editing, UNIX drivers, and cryptographic protection. Currently works at Sentry Data Systems. Chris also contributed to many free projects, such as Gentoo Linux and Clojure, and co-authored The Joy of Clojure .

Aron griffis


Iron Griffis lives in Boston, where he spent the last decade working at Hewlett-Packard on projects such as UNIX network drivers for Tru64, Linux, Xen and KVM virtualization security certification, and most recently, the HP ePrint platform . In his spare time from programming, Ayron prefers to blot out over programming problems while riding his bicycle, juggling bits, or cheering for the Boston professional baseball team Red Socks.

Also popular now: