rsludge October 27, 2012 at 18:39

Using Regular Expressions in Ruby

Regular expressions are a salvation from all troubles for some and a nightmare for other developers, and if to speak objectively, this is a powerful tool that requires, however, great care in application. Ruby's regular expressions (regexes, regexps, regulars) are based on Perl 5 syntax and are therefore familiar to everyone who has used Perl, Python or PHP. But Ruby is good because each component of the language is implemented with its own approach, simplifying the use of this tool and increasing its power. In my short article, I discuss the features of regulars in Ruby and their application in various operators.

In Ruby, everything is an object

First of all, it is worth noting that the regular expression is an object of the corresponding class. Accordingly, it can be created through a call to new and merge (union).

r1 = Regexp.new “a”
r2 = Regexp.new “b”
ru = Regexp.union r1, r2

The expression resulting from the union will correspond to lines corresponding to at least one of the joined templates.

The regular-string matching operator returns the index of the first match or nil, but in many cases we need other information about the match found. You can, like Perl, use special variables, $ ~, $ ', $ &, and so on. If the variables $ 1, $ 2, ... corresponding to groups are fairly easy to remember, then how people generally use the rest has always remained a mystery to me. Therefore, of course, there is another approach in Ruby - you can use the Regexp.last_match method

“abcde” =~ /(b)(c)(d)/
Regexp.last_match[0]            # "asd"
Regexp.last_match[1]            # "b"
Regexp.last_match[2]            # "c"
Regexp.last_match[3]            # "d"
Regexp.last_match.pre_match     # "a"
Regexp.last_match.post_match    # "e"

Named Groups

Starting with version 1.9, Ruby supports the syntax of named groups:

"a reverse b".gsub /(?\w+) reverse (?\w+)/, '\k \k'     # “b a”

The same example demonstrates the use of backlinks, but this feature already exists in all modern PCRE implementations.

\ k- This special sequence is essentially an analog of backlinks for named groups.
\ g- a sequence corresponding to the repetition of a previously defined named group. The difference between them is simply shown using an example:

"1 1" =~ /(?\d+) \k/    # 0
"1 2" =~ /(?\d+) \k/    #nil
"1 a" =~ /(?\d+) \k/     #nil
"1 1" =~ /(?\d+) \g/    # 0
"1 2" =~ /(?\d+) \g/    # 0
"1 a" =~ /(?\d+) \g/     #nil

You can also get matches associated with these groups through the MatchData object:

Regexp.last_match[:first]

Other ways to check compliance

In addition to the traditional = ~ in Ruby, there are other ways to check the string for consistency with the regular expression. In particular, the match method is intended for this, which is especially good because it can be called both for an object of the String class and for an instance of Regexp. But that is not all. You can get a string match by the regular one using the usual indexing method:

"abcde"[/bc?f?/]         # "bc"

as well as the slice method:

"abcde".slice(/bc?f?/)        # "bc"

In addition, there is another, seemingly not the most logical way:

/bc?f?/ === "abcde"        # true

It is unlikely that anyone will use a similar syntax, but this remarkable property of the Ruby language has an application, which will be written below.

The use of regulars in various functions

One of the most useful applications of regular expressions in Ruby, which is however not so common, is their use in the case statement. Example:

str = 'september'
case str
   when /june|july|august/:
       puts "it's summer"
   when /september|october|november/:
       puts "it's autumn"
end

The thing is that the comparison in case is just performed by the aforementioned operator === (more details here ), which allows us to use regexps very succinctly and elegantly in such cases.

Regulars can also be used in the split function. Example with ruby-doc:

"1, 2.34,56, 7".split(%r{,\s*})         #  ["1", "2.34", "56", "7"]

One way to get a list of words from a string using this function:

“one two three”.split(/\W+/)

To work with cyrillic strings:

"строка, из которой нужно получить список слов".split(/[^[:word:]]+/)     # ["строка", "из", "которой", "нужно", "получить", "список", "слов"]
(ruby 1.9 only)

It is sometimes much more convenient to use the scan method to split a string into parts. A previous example using this method:

"строка, из которой нужно получить список слов".scan(/[[:word:]]+/)    # ["строка", "из", "которой", "нужно", "получить", "список", "слов"]
(ruby 1.9 only)

The sub function, which replaces the first occurrence of a substring, can also accept a Regexp object as an input:

"today is september 25".sub(/\w+mber/, 'july')    # "today is july 25"

Similarly, you can use regular expressions in sub !, gsub and gsub! .. methods.

The partition method, which separates a string into 3 parts, can also use a regular expression as a delimiter:

"12:35".partition(/[:\.,]/)        #  ["12", ":", "35"]

Similarly, you can use regular expressions in the rpartition method.

The index and rindex methods can also work with regulars; they, of course, return the indices of the first and last occurrences of them in the string.

additional literature

1. Friddle - Regular Expressions
2. Flanagan, Matsumoto - Ruby Programming Language
3. Ruby-doc class Regexp

Tags: