Simulate intersection, exclusion and subtraction, using leading checks, in regular expressions in ECMAScript

    From translator


    This is a translation of a small note written yesterday by Lea Verou , it proposes an interesting, although not new, technique for solving everyday tasks.

    The information in the article concerns ECMAScript , but can be used in other RegExp Engines (although there is a possibility that there is a more suitable solution).

    If the examples seem complicated to you, I recommend playing with them in the console as you read them. And I will forgive you in advance for reading the frightening title.

    Article


    If you use regular expressions for some time, then you must have come across different variants of the following tasks:

    • Intersection : “Something that matches pattern A and pattern B”
      For example: Password, at least 6 characters, in which at least one digit, at least one letter, and at least one special character

    • Exception : “I want something that matches pattern A but not pattern B”
      For example:Any integer that is not divisible by 50

    • Denial : Everything. Which does not match pattern A.
      For example: A string that does not contain the word “Foo”


    Despite the fact that ECMAScript has circumflex (^), to exclude a character set, we do not have the opportunity to exclude something more complex.

    In addition, we have a vertical bar (|) denoting "OR", but we have nothing that would mean "AND", and nothing that would mean "EXCEPT" (Exception). You can do all these actions with a simple set of characters using character classes, but with complex sequences this will not work.

    Nevertheless, we can simulate all three operations, taking advantage of the fact that leading checks do not capture characters and do not shift the search position. We can simply continue to search for a match further, and they will match the substring we need, because leading checks do not capture anything ...

    An exception


    As a simple example: the expression / ^ (?! meow) \ w {3} $ / will capture any three-character word that does not contain the word “meow”. This is an easy exception.

    Here is the solution for the problem proposed above: / ^ (?! \ D + [50] 0) \ d {3} $ / .

    Intersection


    For intersection (I), we can simply chain several positive leading checks, and grab the line we need with the last template (if we leave only leading checks, we will still get the correct answer, but we can get the wrong matches). For example, the solution for the problem with the password above would be: /^(?=.*\d)(?=.*[az†)(?=.*[\W_{).{6,►$/ i .
    If you want your regular expressions to work in Internet Explorer version 8 and below, it is important to know about this error and change your regular expressions accordingly

    Negation


    Denial is the simplest thing. We just need a negative leading condition and . + To capture the substring that passed the test. For example, the solution for the problem proposed above will look like this: /^(?!.*foo).+$/. It is true, however, to admit that from the entire list, denial is not less useful.


    Conclusion


    There are some difficulties in this technique. This is mainly due to what is captured as a result. (Make sure that the capturing pattern outside the leading checks captures the entire string that you need)

    Steven Levithan digs even deeper and tries to simulate condition statements and atomic groups . Goodbye to the brain.


    Bonus Link Couple


    A utility that parses regular expressions in parts and explains them to the
    JS library , greatly facilitating the work with regular expressions and adding functionality to them.


    Also popular now: