rkh May 11, 2011 at 11:06

About Vim Replacements Using Regular Expressions

From the sandbox

Hi, Habr! It's no secret that the old Vim is very good for solving a diverse range of problems. I would like to write a little about one of the components that make our favorite editor as powerful as it is powerful - a toolkit for replacements that uses regular expressions. I plan to build my story by telling about how I solved a couple of specific problems, and supplementing this story with some basic background information.

On the one hand, about all this there is a detailed help available at the address: help usr_27.txt - from there everything was gleaned. On the other hand, when I needed to solve the described problems, I spent considerable time on this. This gives me the right to hope that my text will still be useful. I just want to make a reservation that I am a person who is far from programming, so my terminology may seem strange or ridiculous - please forgive me for this.

Once I was faced with the need to remove all tags from an html file. After a little thought, I decided that I just need to replace everything that is surrounded by triangular brackets with an empty space, i.e. here is a replacement

 <'фраза'> -> ' ' (ничто).

Search and replacement in Vim is carried out by the command: substitute, however it is much more convenient to use the abbreviation: s. The general syntax for this command is something like this:

 :{пределы}s/{что заменяем}/{на что заменяем}/{опции}

The element {limits} must contain the area in which we would like for the replacement to take place. If you omit this element, then the search and replacement will be performed only in the line where the cursor is located. You can use the '%' character to replace the entire file. To search and replace in the area starting with line l1 and ending with line l2, {limits} must be of the form 'l1, l2', for example: 14.17 s / will search and replace in lines 14 through 17. The line with the cursor, the number of which is symbolically indicated by a dot, and the last line, the number of which is indicated by the dollar sign, deserve special mention. Thus, in order to search from the current line to the end of the file, use the command ':., $ S /'.

All this command, within the specified limits, searches for a sequence that meets the criteria of the “what is replaced” element and replaces this sequence with a sequence of characters constructed according to the rules of the “what is replaced” element, taking into account the options specified after the last slash.

The first command I tried to solve my problem was the following. Before the first slash, there is a search and replace command in the entire file. Between the first and second slashes is the sequence that Vim will look for. More about her.

 :%s/<.*>//g

First comes a triangular bracket, Vim will look for a literal match with it. A dot denotes any character, and an asterisk denotes the occurrence of the previous character an arbitrary number of times - from zero to infinity. Thus, the sequence '. *' Denotes any sequence of any characters. Finally, further closing a triangular bracket. Yes, I apologize if the terminology of “triangular brackets” offends the perception of those who remember that these are “less-more” signs (:

Between the second and third slashes is a sequence of characters that will be substituted for a sequence that meets the specified criteria. We want just bulk removal, so we don’t have anything there.

The character g that completes the command denotes a search in the entire string. Otherwise, Vim would only look for the first match in each of the lines within {limits}. Another useful option is the option 'n', which performs only a search, but does not replace (this helps to check whether the valid search criteria matches the desired ones), and 'c', which asks for confirmation before each replacement act.

So, the described command searches for a sequence that consists of any characters enclosed in triangular brackets. Vim will simply delete every such sequence. Unfortunately, this command does not work properly, because between triangular brackets it looks for anycharacters. Including other triangular brackets. Therefore, if there are several pairs of triangular brackets in one line, Vim will select a sequence that starts with the first opening and ends with the last closing triangular bracket.

The conclusion suggests itself: you need to look for any character between the triangular brackets, excluding the closing triangular bracket. In this case, Vim has a corresponding command. If, when describing the desired sequence, enclose a certain set of characters in square brackets, then Vim will look for anything from these square brackets. For example, the pattern '[az]' will satisfy any lowercase Latin letter. If the first character between the square brackets is the '^' hat, then Vim will be satisfied if it finds anything other than what's inside the brackets. In our case, the phrase

[^>]

will match anything but a closing triangular bracket. Here it is necessary to add that Vim searches for only one character for a pair of square brackets. Those. the last written out pattern is satisfied by any one character, except for the closing triangular bracket. In order for this sequence to satisfy as many characters as you like, you need to supplement it with an asterisk. As a result, the necessary team takes the form

 :%s/<[^>]*>//g

You can figure out how such a task is solved in, say, notepad, and in Vim. In notepad, I would first massively replace the most popular tags with an empty space (for example, I would first start replacing the 'p' tag with an empty space), and then I would look for triangular brackets and delete them and what's inside. It would take me a lot of time to process a really large file. And here everything turns out with one team - it's so simple.

Now about one more task - as a matter of duty, I have to use the Wolfram Mathematica program, which gives a lot of ASCII information at the output, which, in turn, needs to be processed for readability. For example, finding the absolute value of some expression, this program denotes the word 'Abs' and takes this expression in square brackets. I like to read math texts passed through Latex, and finding the absolute value is perfectly natural to designate with vertical sticks (vertical bar). So I need to make a replacement in the whole file

 Abs[ 'выражение' ] -> | 'выражение' |

If you just had to delete all occurrences of the word 'Abs', it would be quite simple and similar to the previous task, but in this case we also need to save the 'expression', and each time it will be new. What to do? The grouping team comes to the rescue. If, when describing the desired sequence, enclose some expression in brackets \ (\), then Vim will put it in the memory under the corresponding number (the first expression is under the number one, the second is two) and will subsequently be called with the command \ x, where x is the number under which the expression was placed in memory.

Thus, the desired command will look something like this:

 :%s/Abs\[\([^\]]*\)\]/|\1|/g

It is worth noting here that for a literal match, square brackets are preceded by slashes, since they are special characters. In general, any special character, if it should participate in the search, indicating its immediate meaning, is preceded by a slash: \ ^; \* etc. The slash itself is also preceded by a slash. It looks like this: to search for the sequence '\ cos', enter '\\ cos'.

Finally, the last task that I would like to write about. The same Mathematica operates with many quantities, which are denoted by a capital Latin letter with a numerical index, consisting of one digit. In ASCII format, these Latin letters and numbers simply go in a row, for example 'U1'. In order for Latex to treat them as a letter with an index, the index must be preceded by the underscore character '_'.

 'Заглавная латинская буква''цифра' -> 'Заглавная латинская буква'_'цифра'

The most trivial solution that suggests itself is to sort through all the combinations, if there are not many. That is, start the replacement first 'U1' -> 'U_1', then 'U2' -> 'U_2', etc. It is clear that this is not our method. We recall that there are square brackets. And in order to find one capital letter in Latin, just enter the template '[AZ]'. But this is not the limit. For such a template, Vim has a special abbreviation: '\ u' (from 'uppercase'). For numbers, there is '\ d' (from 'digit'). More information about such designs can be found at: help pattern.txt. Using these abbreviations, the search command takes the form

 :%s/\(\u\)\(\d\)/\1_\2/g

Here again the grouping occurs in parentheses: it allows you to put the found letter and number in the memory under the corresponding numbers when searching, and subsequently extract them from there, invoking commands with the same numbers: '\ 1' will call the letter, and '\ 2' - the number.

These three simple tasks, it seems to me, perfectly demonstrate the capabilities of Vim in search and replacement. I believe that if I needed to solve one of them, having in my hands a text editor such as a notepad or, say, notepad ++, the time that I would spend on the solution would significantly exceed the time that I would spend on getting on the same machine with a copy of Vim (:

Tags:

About Vim Replacements Using Regular Expressions

Also popular now: