Kneading the brain with regular expressions - Regex Tuesday Challenge

    I want to suggest you break your head in the evening or two over interesting tasks, into the regular expressions that Callum Macrae puts on its website on GitHub every Tuesday.

    Each question is presented as a set of tests. The goal is to write such a regular expression so that all tests turn green.
    Some of the tasks themselves are quite simple, and the most interesting part is to write the shortest possible regular expression.

    Tests use the JavaScript Regex engine of your browser, which has all the basic features of PCRE. More details can be found here (in English) , in the ECMA column in the table.

    In this article I have collected Russian versions of tasks and materials that can help in their solution. It would be interesting to see the most interesting solutions in the comments.

    UPD: There are no retrospective checks in ECMAScript regular expressions.

    1. Highlight repeating words (Link leads to a task)

    Tag <strong> duplicate words.

    Тhis is a test=> this is a test
    Тhis is is a test => this is <strong>is</strong> a test

    2. Grayscale

    Choose shades of gray in different color systems.
    Read about colors at this link .

    #FFF- yes
    rgb(2.5, 2.5,2.5)- yes
    rgb(2, 4, 7) - no

    3. Dates to find strings matching this pattern: YYYY / MM / DD HH: MM (: SS)

    Select existing dates between 1000 and 2012. Seconds may be omitted.
    The author facilitates the task: 30 days in each month.

    2012/09/18 12:10- yes
    2013/09/09 09:09- No (after 2012)

    4. Italics in MarkDown

    Convert text framed in asterisks to italics. Do not touch the text in double asterisks (bold).
    Read more about MarkDown can be in Wikipedia .

    *this is italic*" => <em>this is italic</em>
    **bold text (not italic)** => **bold text (not italic)**

    5. Numbers

    Select numbers with a comma or space, as a separator of digits. (fortunately there were no mommaye)

    8,205,500.4672- yes
    1,5826,000 - no

    6. IPv4 addresses

    Select IPv4 addresses in all possible representations: decimal, hexadecimal and octal. With and without dots. More information about IP addresses can be found on Wikipedia.

    Examples: yes
    0xFF.255.0377.0x12- yes - no

    7. Domain Names

    Domain names for the http and https protocols , with an optional slash at the end. Special characters are not used.

    Examples: yes - no
    кремль.рф- No :(

    8. Duplicate MarkDown Items

    Find and bold (**) repeating items in the MarkDown list.

    * First list item
    * Second list item
    * First list item
    * Second list item

    * Repeated list item
    * Repeated list item
    * Repeated list item
    * **Repeated list item**

    9. Links in MarkDown

    Convert MarkDown links to HTML . They look like this: [text](
    The main thing is not to be confused with the pictures:![alt text](image location)

    [Basic link]( => <a href="">Basic link
    [Invalid](javascript:alert()) => [Invalid](javascript:alert())

    10. Divide the offer into tokens.

    Break offer into tokens. This may be useful, for example, for a search engine.

    There are several rules:

    • A few words in quotation marks should fall into one token
      This "huge test" is pointless => this,huge test,is,pointless
    • Words written through a hyphen also fall into one token.
      Words written through several hyphens (dashes), or with a hyphen at the beginning or at the end, fall into separate tokens.
      Suzie Smith-Hopper test--hyphens => Suzie,Smith-Hopper,test,hyphens.
    • Abbreviations (contractions) fall into one token
      I can't do it => I,can't,do,it.
    • All punctuation except apostrophes and hyphens must be removed.
      Too long; didn't read => Too,long,didn't,read.

    11. Letters in alphabetical order.

    Choose a sequence of non-repeating characters in alphabetical order. Spaces should be ignored. Unfortunately, the solutions I know are not very successful.

    abcdefghijk- yes
    abbc- no

    12. We correct gaps

    Remove duplicate spaces and tabs, leave one space between words and two between sentences.

    Extra       spaces => Extra spaces
    Sentence.      Sentence. => Sentence.  Sentence.

    13. Repeating words under each other

    Select duplicate words that are directly below each other.
    A monospace font is assumed. Lines longer than 32 characters are hyphenated.

    This sentence is pretty long and
    this sentence is also a test- yes
    This sentence also shouldn't
    match as this has no words
    below.- no

    14. Brutforsim chemical elements

    > Select the first 50 chemical elements of the periodic table . The solution is pretty obvious, so the task is to find the shortest solution.

    H- yes
    M- no

    15. Musical chords

    Choose musical accords, such as Cmin , or Bmaj . Both a brief and complete entry are needed. For this problem, suppose that the chords E♯ , B♯ , F ♭ and C ♭ do not exist.

    For those interested, there is a good article on chords in Russian and an English Wikipedia article that uses the appropriate characters

    . Also note that a sharp (♯)
    is not the same as a pound (#).

    C- yes
    Z - no

    16. Brutforsim chemical elements

    Choose chemical elements with an atomic number greater than 50.

    I- yes
    A- no

    17. Regular expression for regular expression.

    Choose a properly constructed regular expression. For starters, we restrict ourselves to literals (possibly escaped), classes, and several quantifiers.

    /regexp?/- yes
    regex- no

    18. IRC - Messages

    Select a correctly formed IRC message.
    Here is a link to the Russian version of the specification.

    [_]!abc@test PRIVMSG #chat :Test- yes
    c.m! PRIVMSG #chat :Hello!- no

    Also popular now: