Regular expressions in WinEdt: finding formulas with unused numbers

After a more detailed acquaintance with the manual of the WinEdt editor (intended almost exclusively for creating LaTeX documents), it opened up additional possibilities for the search / replace tool for this program. To activate the smart search, you need to check the Regular Expressions checkbox in the Find or Find and Replace menu, as a result of which the search string will turn, in fact, into a command line with which you can work wonders. That is, it will be possible to do almost anything with the text, another question is that it is sometimes too perverse (therefore, in case of serious tasks, the creation of the corresponding macros looks more appropriate).

Joke about the gynecologist
A gynecologist comes to get a job in a car service. He is asked to disassemble, assemble the engine. He performs and is interested in evaluating his work. They answer him: "in principle, nothing, only now we see for the first time that all this is done through the exhaust pipe."

I will give an example. You need to find all the unused labels \ labels, that is, those to which there is not a single ref link in the text of the work (all labels that are never referred to, as stated in the English-language manual). An extra mark, harmless in itself, can signal, in particular, about excessive "numbered" formulas of the latekh document (that is, about the presence of formulas with unused numbers in it). If the text is large enough and has many numbered relationships, the appearance of such labels is almost inevitable (you once referred to this equation, then changed the text by deleting the link, and most likely forgot to remove the number from the equation). At the same time, manual detection of “extra” marks turns (again, due to the large volume of material) into too cumbersome and, most importantly, dumb mechanical work,

So, we will solve the problem with the help of a “smart” search for the WinEdt editor (version 5.3 obviously should be enough). First of all, I note that WinEdt reserves memory cells (registers) with the names%! 0, ...,%! 9 for user needs. Moreover, it must be borne in mind that this memory is essentially operational in the sense that it is reset at every restart of WinEdt. We use this memory to save the contents of all \ ref links as one long line: press ctrl + F, do not forget about the checkmark in the Regular Expressions checkbox of the menu that opens, in the search bar of which we enter the following text:

\\ref\(\{*\}\)\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|}



Some explanations (partially revealing the meaning of the last line). When Regular Expressions mode is on, some characters (for example, \, {and}) take on an official meaning; if we need them in their immediate meaning (that is, as corresponding characters), they should be used together with a forward slash (for example, \\, \ {and \}). But there are exceptions: for example, parentheses themselves are not official characters (thereby meaning literally parentheses), but in the combinations \ (and \), on the contrary, they have a special meaning. The text enclosed between \ (and \) turns into a so-called tag (expression-tag or marked text) and can be used in the future: access to this text (for example, in the same search line to detect a duplicate fragment or in the line “replace with”) is performed using the \ 0 command (zero is the default number of the tagged fragment). If there is a need to highlight several parts, you should use structures of the form:

\(0 какой-то текст \), …, \(9 какой-то текст \)

and the commands \ 0, ..., \ 9 to refer to the corresponding parts.

And what is the asterisk * between \ {and \} at the beginning of the entered text? This asterisk is called a pattern and means an arbitrary sequence of characters (including empty) within one line (I note that, starting with WinEdt 5.3, the combination ** encodes arbitrary text, including line breaks).

Thus, the character set:

\\ref\(\{*\}\)

that is, the first part of the expression in question, searches for any combinations of the form: \ ref {arbitrary text}. When such a combination is detected, macros are started, as evidenced by the second part of the expression, starting with \ X (in the absence of a macro start command, WinEdt simply goes to the found combination, highlighting it in the text of the document). Moreover, the macro launch command can start with \ x (the register matters!), As well as with \ Xx and \ xX. The fact is that depending on the results of macros execution, WinEdt can either go to the found fragment (in our case it is \ ref {arbitrary text}) and ignore it (as if it was different from the one specified in the search line) by going to search for the next match. And which of these two alternatives he prefers, it is determined by the register of the “x-command” and the value of the boolean variable IFOK used by WinEdt (equal to true by default), which some macros can modify. In the case of the \ X command, the WinEdt response is consistent with the IFOK value: if the IFOK value is true, WinEdt goes to the found fragment; if it is false, WinEdt ignores this fragment. In the case of the \ x command, WinEdt's reaction to the IFOK value is directly opposite, and when using \ Xx or \ xX, WinEdt displays the detected text regardless of the IFOK value.

Let us consider in more detail the second part of the analyzed string, i.e. the command:

\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|}

It runs two macros: GetTag (0,0) and LetReg (1, "%! 1%! 0"). The GetTag (n, m) macro writes the contents of the nth tag (in our case, the null tag, i.e., the argument of the \ ref command along with curly braces surrounding it) to the mth register, i.e., to a memory cell with the name %! m (in our case, with the name%! 0). The macro LetReg (k, “string”) writes its second argument to the k-th cell (without framing quotes). It turns out that in our case LetReg overwrites the first register (initially there is nothing in it), adding the contents of the 0th register to it, that is, the argument enclosed in curly brackets by the \ ref command found by WinEdt. Thus, to enter in the%! 0 cell the sequence of arguments of all the \ ref commands found in the text, you can enter in the search line:

\\ref\(\{*\}\)\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|}

Go through the entire document. This is done relatively simply and quickly: after the first successful detection of a given text, we search for all subsequent occurrences by pressing and holding the F3 key (for a document containing many hundreds of numbered relationships, it took no more than 30 seconds to hold F3). However, there is an alternative option - you can use the WinEdt editor replacement tool: press ctrl + R, enter in the search bar:

\\ref\(\{*\}\)\X{\"|GetTag(0,0);LetReg(1,"%!1%!0");|}

in the line "replace with":

\ref\(\{*\}\)

when you are asked to confirm the replacement, select All and go (do not forget about the checkmark in the Regular Expressions checkbox).

The preparatory work is completed. Now detection of unused labels is done by calling the search with the expression:

\\label{\{\(*\)\}}\x{FindInString("%!1","\0");}

in the search bar (remember the \ xc IFOK! connection). Search with argument:

\\label{\{\(*\)\}}\X{FindInString("%!1","\0");}

solves the opposite question, showing only those labels that appear in the argument of at least one of the ref-commands. I note that the presence of external curly brackets in the expression: \\ label {\ (\ {* \} \)} is syntactically redundant, however, if they are missing, WinEdt search gives, generally speaking, an incorrect result. This feature has no rational explanation - it must be remembered (the English-language manual simply says that it is important to use: {\ {\ (* \) \}} because \ {\ (* \) \} will not work here! )

A joke about the Georgian school
A teacher in a Russian language lesson in a Georgian school: “Children, remember: the words salt, beans and noodles are written with a soft sign, and the words fork, bulka and plate are without. This is inexplicable and you just need to remember! ”

I also note that the curly brackets surrounding the arguments of ref-commands when writing to the%! 0 and%! 1 registers, while not strictly required, are nevertheless very advisable, since they allow you to avoid errors in cases such as the following:

… \label{h1} … \label{h2} … \label{1h} … \label{h3} … \ref{h1} … \ref{h2} … \label{h}

(if instead of the construction \\ ref \ (\ {* *}}) we use \\ ref \ {\ (* \) \} without including {and} in the zipped fragment, the content of the links forms the line: h1h2, the search for which will give a false result about using labels with the names h and 1h). This, however, does not relieve us of all possible errors, since the arguments of labels and links themselves may contain (of course, in a pairwise way) braces (for example, \ label {h {1}}). To completely eliminate misunderstandings, it is easiest to abandon the use of curly braces when naming links; if you managed to create a huge document with an incredible number of links, the names of which contain these brackets, then perhaps you can’t do without a special macro.

So, the method described here allows (with the caveat mentioned above) to detect all cases when some environment that generates a number (for example, equation) contains unused \ labels. But an "extra" number may appear even when such an environment does not contain \ label at all. Fortunately, using the advanced search mechanisms, WinEdt is easy to detect (Search for field):

\\begin\{equation\}\(**\)\end\{equation\}\x{FindInString("\0","\label");}

and even fix it (Replace with field):

\begin\{equation\*\}\0\end\{equation\*\}

all such misunderstandings (for definiteness, the case of the equation equation mentioned above was considered).

Also popular now: