Perl REGEXP example for quick text processing

A very useful recipe that facilitates communication with the command line is described here , but it was not possible to try it live, because under my system (OpenIndiana) there is no Go language compiler. So the idea arose to rewrite the specified program into a more universal language that definitely exists on any platform - Perl.

Using the resulting code example, I would like to demonstrate how you can perform a quick and effective search using a couple of lines using regular expressions.

Implementation


To begin with, we transform the hints passed in to the part of the future regular expression:

$hints =~ s/(.)/$1\.\*\?/g;

Here $ hints is the string glued from all the hints, for example, 'abcd'.
The expression for the search (.) Is a single character (each, given the search parameter 'g'), we replace it with ourselves ($ 1 is the value from the first brackets of the search expression) and add the parts we need, namely:

After each character we add a block : '. *?', which means: any character, zero or more times, and a marker that makes the modifier “not greedy” (more on that below).

Total output is the line: 'a. *? B. *? C. *? D. *?'

We turn to the main part, which compares the string from the "familiar" folders with a hint, the condition:

if ($path =~ /^(.*)($hints)$/)

Here the symbol '^' is the “anchor” of the beginning of the line, the expression in the first brackets '(. *)' Is the line prefix, and after this expression is our pre-prepared regexp containing hints for the search, the expression ends with the second “anchor” - '$', which means match the end of the line.

Since all modifiers '*' in the line except the first contain the markers '?', The only modifier without this marker becomes "greedy", i.e. trying to take away as much of the line as possible.

In our example, we have the converted string at the output: /^(.*)(a.*?b.*?c.*?d.*?)$/

In fact, the search in this case is carried out from right to left, i.e. first, from the end of the line, look for the closest character 'd', then to the left of it, the closest character 'c', then the closest character 'b' to the left of it and then the closest character 'a' to it, all that will be to the left of the character found 'a' will fall into the "greedy" prefix. We will determine the position of the result in the line by the length of this prefix, namely by the line $w = length($1);(here $ 1 will get the value from the first brackets of the previous regexp), the rest of the conditions (the more to the right, the better) have already been fulfilled for us by regexp.

It remains only to add the functions LoadPaths and add and finish the processing of startup parameters.

Full script:

hg clone bitbucket.org/eugenet/perlre

Also popular now: