
Emoji ?! No, I have not heard

The latest versions of iOS and Android have support for more than 1200 emoji characters, but the desktop market cannot boast of such successes. We at Badoo want and do everything so that users can comfortably communicate on all platforms without any restrictions in correspondence.
Next, I will tell you how we achieved 100% emoji support for the web.
This is how a Windows user would see a message in a browser without emoji:

The main idea is that we take any emoji character, define its Unicode code and replace it with an html element that will correctly display in the browser.
Theory
Consider

`

'

'

As a result, we got a surrogate pair: U + D83D U + DE00 .
UTF-16 encodes characters in the form of a sequence of 16-bit words, this allows you to record Unicode characters in the ranges from U + 0000 to U + D7FF and from U + E000 to U + 10FFFF (a total of 1,112,064). If you want to represent a character with a code larger than U + FFFF in UTF-16, then two words are used: the first part of the surrogate pair (in the range from 0xD800 to 0xDBFF) and the second (from 0xDC00 to 0xDFFF).
To get the emoji code, which is in the range greater than U + FFFF, we use the formula:
(0xD83D - 0xD800) * 0x400 + 0xDE00 - 0xDC00 + 0x10000 = 1f600
Now translate back:
D83D = ((0x1f600 - 0x10000) >> 10) + 0xD800;
DE00 = ((0x1f600 - 0x10000) % 0x400) + 0xDC00;
This is quite difficult and inconvenient, consider what ES 2015 can offer us .
With the new JavaScript standard, you can forget about surrogate pairs and make your life easier:
String.prototype.codePointAt // возвращает код из символа,
String.fromCodePoint // возвращает символ из кода.
Both methods work correctly with surrogate pairs.
The ability to insert eight-digit codes in the string:
\ u {1F466} instead of \ uD83D \ uDC66
RegExp.prototype.unicode : the u flag in regular expressions gives better support when working with Unicode:
/\u{1F466}/u
Currently, the Unicode 8.0 standard contains 1281 emoji characters, and this does not include skin color modifiers and groups (family emojis). There are various implementations from well-known companies:

Emoji can be divided into several groups:
- simple: in the range up to 0xD7FF -
;
- surrogate pairs: from 0xD800 to 0xDFFF -
;
- numbers: from 0x0023 to 0x0039 + 0x20E3 -
;
- state flags: 2 characters from 0xDDE6 to 0xDDFF, as a result -
;
- skin color modifiers:
+ from 0xDFFB to 0xDFFF -
;
- family: sequence of
connected 0x200D or 0x200C -
Decision:
- we get the source text with the symbol, we look in it with the help of a regular expression for all emoji sets;
- define the character code using the codePointAt function;
- create an img element (it is important that this is the img tag) with a url that consists of the code for this symbol;
- replace the symbol with img in the source text.
function emojiToHtml(str) {
str = str.replace(/\uFE0F/g, '');
return str.replace(emojiRegex, buildImgFromEmoji);
}
var tpl = '
';
var url = 'https://badoocdn.com/big/chat/emoji/{code}.png';
var url2 = 'https://badoocdn.com/big/chat/emoji@x2/{code}.png';
function buildImgFromEmoji(emoji) {
var codePoint = extractEmojiToCodePoint(emoji);
return $tpl(tpl, {
code: codePoint,
src: $tpl(url, {
code: codePoint
}),
src_x2: $tpl(url2, {
code: codePoint
})
});
}
function extractEmojiToCodePoint(emoji) {
return emoji
.split('')
.map(function (symbol, index) {
return emoji.codePointAt(index).toString(16);
})
.filter(function (codePoint) {
return !isSurrogatePair(codePoint);
}, this)
.join('-');
}
function isSurrogatePair(codePoint) {
codePoint = parseInt(codePoint, 16);
return codePoint >= 0xD800 && codePoint <= 0xDFFF;
}
The main idea in a regular expression that finds emoji characters:
var emojiRanges = [
'(?:\uD83C[\uDDE6-\uDDFF]){2}', // флаги
'[\u0023-\u0039]\u20E3', // числа
'(?:[\uD83D\uD83C\uD83E][\uDC00-\uDFFF]|[\u270A-\u270D\u261D\u26F9])\uD83C[\uDFFB-\uDFFF]', // цвет кожи
'\uD83D[\uDC68\uDC69][\u200D\u200C].+?\uD83D[\uDC66-\uDC69](?![\u200D\u200C])', // семья
'[\uD83D\uD83C\uD83E][\uDC00-\uDFFF]', // суррогатная пара
'[\u3297\u3299\u303D\u2B50\u2B55\u2B1B\u27BF\u27A1\u24C2\u25B6\u25C0\u2600\u2705\u21AA\u21A9]', // обычные
'[\u203C\u2049\u2122\u2328\u2601\u260E\u261d\u2620\u2626\u262A\u2638\u2639\u263a\u267B\u267F\u2702\u2708]',
'[\u2194-\u2199]',
'[\u2B05-\u2B07]',
'[\u2934-\u2935]',
'[\u2795-\u2797]',
'[\u2709-\u2764]',
'[\u2622-\u2623]',
'[\u262E-\u262F]',
'[\u231A-\u231B]',
'[\u23E9-\u23EF]',
'[\u23F0-\u23F4]',
'[\u23F8-\u23FA]',
'[\u25AA-\u25AB]',
'[\u25FB-\u25FE]',
'[\u2602-\u2618]',
'[\u2648-\u2653]',
'[\u2660-\u2668]',
'[\u26A0-\u26FA]',
'[\u2692-\u269C]'
];
var emojiRegex = new RegExp(emojiRanges.join('|'), 'g');
Chat
Next, we’ll look at how you can build a chat prototype with emoji support.
A div is used as a field for entering a message:
When entering a message or pasting from the clipboard, we will clean its contents from possible html tags:
var tagRegex = /<[^>]+>/gim;
var styleTagRegex = /
To process a line pasted from the clipboard, we use the paste event:
function onPaste(e) {
e.preventDefault();
var clp = e.clipboardData;
if (clp !== undefined || window.clipboardData !== undefined) {
var text;
if (clp !== undefined) {
text = clp.getData('text/html') || clp.getData('text/plain') || '';
} else {
text = window.clipboardData.getData('text') || '';
}
if (text) {
text = cleanUp(text);
text = emojiToHtml(text);
var el = document.createElement('span');
el.innerHTML = text;
el.innerHTML = el.innerHTML.replace(/\n/g, '');
t.appendChild(el);
restore();
}
}
}
Then we replace all found emojis with the html img tag, as shown above. It is on img, since contenteditable works best with it. With other elements, bugs may occur during editing.
After img is inserted into the input field, you need to restore the carriage position so that the user can continue typing the message. To do this, use the JavaScript Selection and Range objects :
function restore() {
var range = document.createRange();
range.selectNodeContents(t);
range.collapse(false);
var sel = window.getSelection();
sel.removeAllRanges();
sel.addRange(range);
}
After the message set is completed, the reverse procedure is required. Namely, turn img into a character to send to the server using the fromCodePoint function :
var htmlToEmojiRegex = //gi;
function htmlToEmoji(html) {
return html.replace(htmlToEmojiRegex, function (imgTag, codesStr) {
var codesInt = codesStr.split('-').map(function (codePoint) {
return parseInt(codePoint, 16);
});
var emoji = String.fromCodePoint.apply(null, codesInt);
return emoji.match(emojiRegex) ? emoji : '';
});
}
You can see a chat example here: https://jsfiddle.net/q9484hcc/
So we developed emoji support so that our users can express emotions fully and communicate with each other without restrictions. If you have ideas for improving our methods or changing them - write in the comments, we will discuss them with pleasure!
Useful links:
http://emojipedia.org/
http://getemoji.com/
Polyphil
String.fromCodePoint Polyphil String.prototype.codePointAt
Artem Kunets
Frontend-developer Badoo