You will not be able to solve this problem at the interview.

Hi, Habr. I want to share with you one interesting problem that many of us received at the interview, but probably did not even realize that we were solving it incorrectly.

First of all, a little history. While working in the posts of team lead and technical team, I sometimes had to conduct interviews, accordingly I need to prepare some theoretical questions, well, a couple of simple tasks that would not take more than 2 to 3 minutes to solve. If everything is simple with the theory, my favorite question is: “what is typeof null?”, By the answer you can immediately understand who is sitting in front of you, June will simply answer correctly, and the candidate for seniors will also explain why. That with practice is more difficult. For a long time I could not come up with a normal task, not driven away, such as fizz-buzz, but something of my own. Therefore, I gave assignments at interviews that I went through myself, getting a job. The first of them will be discussed.

Task text

Write a function that takes a string as input and returns this string “backwards”

function strReverse(str) {};
strReverse('Habr') === 'rbaH'; // true

A very simple task, the solutions for which are many, the most optimal for myself, I have long considered such a solution:

const strReverse = str => str.split('').reverse().join('');

But something in this decision always confused me, namely the unreliability of “split ('')”. And after one of the interviews, I thought: “What can I convey in a line that will break my way ...?”. The answer came very quickly.

Oh yes, you could already understand what I mean, emoji! These damn emoticons, they were invented by the devil himself, you just look at what turns the finger up if you turn it over (no, not a finger down).
I want to apologize right away, the markup editor removes emoji from the code, so I insert pictures.

Okay, the task is simple, you just need to normally break the string by characters. Go ahead, the team, a couple of hours of brainstorming the team and we have a solution, to be honest, I’m surprised that we didn’t think of doing it before, now it will be my favorite implementation!


Fire, works, super, but ... Wait a minute, recently we can specify the color for the smile, and what happens if we pass such an emoji to a function?
This is a fiasco bro!

Here I sat in a puddle. To be honest, I suggested a couple of times during the interviews to solve this problem, mainly hoping that they would offer me a solution that could do this - no, the applicants shrugged and could not help me.
The case helped, well, or sports interest. With the words “Do you want a puzzle from a special Olympiad?” I sent it to my former colleague. “Ok, I'll try to do in the evening” - the answer came, and I tensed ... “What if he does? But what if it really can? He can, but I can’t? So things won’t go! ”- so I thought, and I began to wool the Internet.
Here I will move on to the theoretical part, which some of you may find interesting and useful, and some, a repetition of the material you have learned.

What do we need to know about Emoji?

Firstly, this is the standard ! A standard that is well described .

The decisive moment in the life of emoji can be considered the day of adoption of the unicode 8.0 standard and the emoji 2.0 standard in it, then the first Unicode sequences and emoji sequences were described.

Let’s stop here a little longer and analyze the question in more detail.

According to the first version of the standard, emoji is a representation of one Unicode character.


And so on ...

The second version of the standard allows us to take several Unicode characters in a certain sequence to get emoji


Full list

This is simple sequences in emoji, but they are simple only because there are also zwj sequences.
ZERO WIDTH JOINER (ZWJ) is a zero-width connector, this is the situation when a special unicode character ZWJ (200D) is inserted between several emoji, which "collapses" the emoji on both sides of it and here is what we get:

Full list

In subsequent standards, these sequences were only supplemented, so the number of emoji combinations only grew with time.

Well, we’ve figured out the mat part, but what can we do to turn the line around and still keep the sequence?

Regular expressions.

If you have a problem and you want to solve it with regular expressions, now you have two problems.
Going deeper into studying the Unicode standard, we find a separate section on emoji sequences , which talks about how sequences should be implemented, and everything turns out to be quite simple.

The sequence can be composed by the following formula

emoji_sequence :=
| emoji_zwj_sequence
| emoji_tag_sequence
# по пунктам 
emoji_core_sequence :=
| emoji_presentation_sequence
| emoji_keycap_sequence
| emoji_modifier_sequence
| emoji_flag_sequence
emoji_presentation_sequence :=
  emoji_character emoji_presentation_selector
emoji_presentation_selector := \x{FE0F}
emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}
emoji_modifier_sequence :=
  emoji_modifier_base emoji_modifier
emoji_modifier_base := \p{Emoji_Modifier_Base}
emoji_modifier := \p{Emoji_Modifier}
# к этому вернемся чуть позже
emoji_flag_sequence :=
  regional_indicator regional_indicator
regional_indicator := \p{Regional_Indicator}
emoji_zwj_sequence :=
  emoji_zwj_element ( ZWJ emoji_zwj_element )+
emoji_zwj_element :=
| emoji_presentation_sequence
| emoji_modifier_sequence
emoji_tag_sequence := 
    tag_base tag_spec tag_term
tag_base := 
| emoji_modifier_sequence
| emoji_presentation_sequence
tag_spec := [\x{E0020}-\x{E007E}]+
tag_term := \x{E007F}

In principle, this is already enough to correctly (no) compose a regular expression, but a few more words about Unicode.

Unicode categories

Unicode defines categories using which we can find in regular expressions, for example, all capital letters, or, for example, all letters of the Latin alphabet. More details about the list can be found here . What is important for us: the standard defines categories for emoji: {Emoji}, {Emoji_Presentation}, {Emoji_Modifier}, {Emoji_Modifier_Base} , and it would seem that everything is fine, let's use it, but they are not yet included in the ECMAScript implementation. More precisely, only one category has entered - {Emoji}


The rest are currently under consideration in tc-39 (stage-2 at the time of April 10, 2019).

“Well, I’ll have to write a regular season,” thought and after about an hour, my ex-colleague throws me a link to the , well, yes, on the github there is always what you were just going to write ... It's a pity, but it's not about that ... The library implements and imports the regular expression to search for emoji, in principle, what you need! Finally, you can try to write an implementation of the function we need!

    const emojiRegex = require('emoji-regex');
    const regex = emojiRegex();
    function stringReverse(string) {
        let match;
        const emojis = [];
        const separator = `unique_separator_${Math.random()}`;
        const reversedSeparator = [...separator].reverse().join('');
        while (match = regex.exec(string)) {
            const emoji = match[0];
        return [...string.replace(regex, separator)].reverse().join('').replace(new RegExp(reversedSeparator, 'gm'), () => emojis.pop());


To summarize

I love tasks from the “ special ” Olympiad, they make me learn something new, each time expanding the boundaries of knowledge. I don’t understand people who say: “I don’t understand why you need to know that null> = 0? This will not be useful to me! ” It will come in handy, 100% will come in handy, at the moment when you will find out the cause of a particular phenomenon - you will pump yourself up as a programmer and you will become better. Not better than someone, but better than yourself, who a couple of hours ago did not know how to solve a problem.

Thanks for reading, thank you all, I will be glad to any comments.

Required postscript:

The letter \ u {0415} \ u {0308} broke everything. This letter ё, consisting of 2 characters, turns out to be in the Unicode standard, there is a variant of combining not only emoji, but just characters ... But this is a completely different story.

UPD: This is not about the letter “ё”, but about the combination of 2 Unicode characters u {0415} (Е) and u {0308} (“̈), which, going one after another, form a Unicode sequence and we see the letter“ ё ”on screen.

Also popular now: