Code that is not
Hi, habravchane!
About a year ago Habr was swept by a wave of posts on a subject "% string% in N lines on JavaScript" . I don’t even remember how it all ended, but it all started with Excel in 30 lines . Many other interesting variations appeared on this subject, even playing zero lines on JS , but this is a completely different story ...
No matter how I tried to come up with something even more compact, nothing came of it. Then it was decided to look at the problem from a different angle. Around this moment, a question flashed through my head: is it possible to “collapse” the code so that it does not exist at all ?And then David Blaine called me.
I tried to add some magic and that’s what I got.
The task is to write code that ... no matter how. He must also be able to do something. It is obvious that any manipulation to accompany a certain function, which would be able to interpret these manipulations, and therefore hide the code at all , alas, will not work, but to cut the last two or three lines - easily.
Many people know or have heard that there are unprintable characters in computer typography, i.e. virtually invisible. Moreover, this is not some kind of bug or chip, but quite normal behavior - to be invisible. Currently, one of the generally accepted and standardized text encodings is UTF-8 , it is used on almost any modern site. It is also valuable that there is a whole bunch of invisible characters! For example, one of them is Zero Width Space (U + 200B) . Here it is: "". See? Not? But he is.
For those who want to touch their hands, I give a link to an example a year ago: watch a working demo for free online . Later, several months after a note in Habr's sandbox, I accidentally came across a post where this idea was viewed (method number three), but without a twist.
In my version, the encoding was done in the most primitive way. Minus - the file size increases significantly, plus - you need only two characters for encoding. It looked something like this:
After quite a long time, I returned to this topic as part of a project that I am engaged in. An attempt was made to go further and began with the fact that now each character is encoded not by ones and zeros, but by four characters:
As a result, having a set of 16 characters, you can reduce the excess code excess:
The increase in the volume occupied by the code in this example decreased to 4x (4 characters to encode one), but in theory, if you do not need Russian and / or some other non-Latin characters, then you can achieve 2x .
Let it be such a code:
After feeding the code to the obfuscator (I will not cite and parse the code, there is nothing interesting in it), the output is something like:
Note that the semicolon is inside the quotation marks, although this is actually not the case (you can check it in almost any text editor, for example Sublime ). On the one hand, this adds +5 to obfuscation, is misleading and threatens with a light brain-fuck , on the other hand, the “correct” editor will not use characters that affect the direction of the text (from left to right, from right to left).
This is what the decoding function looks like as a result:
More than sure that the code could be better, smaller and more elegant, but such a task is not worth this post. Please note that at the end of the line, the code goes right-to-left again. In general, this nuance can be eliminated by picking up slightly different invisible characters.
Now you can “manifest” the invisible code:
You can directly copy from here to the console or see here .
( The “developer” code cited as an example is tailored specifically for certain characters used in the “disappearance” function. By changing these characters in places and / or using others, the number of “code table” options soars far to infinity. )
All this has one big minus: you can peep at the
You can fix this misunderstanding. If you use all four characters for coding (as I mentioned above), then there is the possibility of obfuscation inside obfuscation.Xzibit thrilled :
You can watch it here . By the way, no one bothers to obfuscate the code at least three times, at least four times.
Now the resulting code looks like this:
Already better, but something else can be done. At the beginning of the article, I outlined the task of hiding the code as if it weren’t. Now, you can simply open the console and you can immediately see what is bad. It turned out to be quite simple to eliminate this nuance: instead,
Unfortunately, when executing this code on jsFiddle, the code still shows up. There is a suspicion that this is somehow related to the fact that the code in the JavaScript window also wraps in
1 . For fun.
2 . Code obfuscation tool.
3 . Use in conjunction with other ways to minify / obfuscate the code. For example, the Google Closure Compiler or UglifyJS .
4 . The possibility of secretive communication in open areas.
5 . "Sleeping" scripts, "bookmarks" in articles, messages on forums, message boards, in contextual advertising, but generally wherever they give something to write and this then gets to users' browsers.
With the last two points, not everything is so simple, but I could not help but mention them. Reveal a little thought. During a quick look, it was found that, for example, Gmail and Yandex.MailDo not delete such characters. Some are transformed into a view
But actually (you need to apply the function
Thus, there is the possibility of covert communication in open areas. In this case, no one will suspect anything. (Unless they will purposefully search, but this is a different conversation). In general, one of the main advantages (and maybe the only one) is that the content passes “visual control” (but not everything is so smooth here: you can change the encoding and everything will be exposed). It is like an invisible person and a video surveillance system. Who knows, maybe the whole Internet has been stuffed with such messages for a long time? :)
As for the hidden scripts, then everything is obvious: if malicious js code containing a “developer” and a “launcher” gets into your browser in any way (for example, many libraries are loaded from the outside, take the same jQuery , which doesn’t hacked so long, or some common plugin, for example AdBlock ), then the code hidden on the page with a certain degree of probability will be launched. That is, in this case, the scheme is this: a lot of different "vectors", each of which is directed in its own way and one single "activator", which is very tiny.
Thanks for attention.
P.S. A little "nishtyachok": if in the script at the beginning of the line put the character U + 202E (Right-To-Left Override) in quotation marks, then it will be fun. The functionality of the code is preserved:
About a year ago Habr was swept by a wave of posts on a subject "% string% in N lines on JavaScript" . I don’t even remember how it all ended, but it all started with Excel in 30 lines . Many other interesting variations appeared on this subject, even playing zero lines on JS , but this is a completely different story ...
No matter how I tried to come up with something even more compact, nothing came of it. Then it was decided to look at the problem from a different angle. Around this moment, a question flashed through my head: is it possible to “collapse” the code so that it does not exist at all ?
I tried to add some magic and that’s what I got.
"Disappearing" code
The task is to write code that ... no matter how. He must also be able to do something. It is obvious that any manipulation to accompany a certain function, which would be able to interpret these manipulations, and therefore hide the code at all , alas, will not work, but to cut the last two or three lines - easily.
Many people know or have heard that there are unprintable characters in computer typography, i.e. virtually invisible. Moreover, this is not some kind of bug or chip, but quite normal behavior - to be invisible. Currently, one of the generally accepted and standardized text encodings is UTF-8 , it is used on almost any modern site. It is also valuable that there is a whole bunch of invisible characters! For example, one of them is Zero Width Space (U + 200B) . Here it is: "". See? Not? But he is.
David Blaine Method
For those who want to touch their hands, I give a link to an example a year ago: watch a working demo for free online . Later, several months after a note in Habr's sandbox, I accidentally came across a post where this idea was viewed (method number three), but without a twist.
In my version, the encoding was done in the most primitive way. Minus - the file size increases significantly, plus - you need only two characters for encoding. It looked something like this:
var code = '1101101111110111111111111111110101101101111101111';
After quite a long time, I returned to this topic as part of a project that I am engaged in. An attempt was made to go further and began with the fact that now each character is encoded not by ones and zeros, but by four characters:
"f".charCodeAt(0).toString(16);
// "66"
//Таким образом код символа "f" - 0x0066
String.fromCharCode("0x0066");
// "f"
As a result, having a set of 16 characters, you can reduce the excess code excess:
var Symbols = ["й","ц","у","к","е","н","г","ш","щ","з","х","ъ","ф","ы","в","а"];
//Теперь можно закодировать символ "f":
var bar = invisibleJS("f");
// bar = "ййгг";
The increase in the volume occupied by the code in this example decreased to 4x (4 characters to encode one), but in theory, if you do not need Russian and / or some other non-Latin characters, then you can achieve 2x .
Examples in the studio
Let it be such a code:
alert("Hello world!");
After feeding the code to the obfuscator (I will not cite and parse the code, there is nothing interesting in it), the output is something like:
var helloworld = "";
Note that the semicolon is inside the quotation marks, although this is actually not the case (you can check it in almost any text editor, for example Sublime ). On the one hand, this adds +5 to obfuscation, is misleading and threatens with a light brain-fuck , on the other hand, the “correct” editor will not use characters that affect the direction of the text (from left to right, from right to left).
This is what the decoding function looks like as a result:
var revealJS = function(s){return s.match(/(.{4})/g).map(function(b){return b.split('').map(function(i){return Array.apply(null,{length:10}).map(Number.call,Number).concat('abcdef'.split(''))[''.split('').indexOf(i)]})}).map(function(c){return String.fromCharCode(0+"x"+c.join(''))}).join('')}
More than sure that the code could be better, smaller and more elegant, but such a task is not worth this post. Please note that at the end of the line, the code goes right-to-left again. In general, this nuance can be eliminated by picking up slightly different invisible characters.
Now you can “manifest” the invisible code:
var helloworld = "";
revealJS(helloworld);
// "alert("Hello world!")"
eval(revealJS(helloworld));
// Оп!
You can directly copy from here to the console or see here .
( The “developer” code cited as an example is tailored specifically for certain characters used in the “disappearance” function. By changing these characters in places and / or using others, the number of “code table” options soars far to infinity. )
All this has one big minus: you can peep at the
eval()
code executed with the help of . Moreover, the console will even indicate the file / line where this code is started from: You can fix this misunderstanding. If you use all four characters for coding (as I mentioned above), then there is the possibility of obfuscation inside obfuscation.
// Брюки превращаются...
alert("Hello world!");
// Превращаются...
window["alert"]("Hello world!");
// Брюки...
window[revealJS("")](revealJS(""))
// ...превращаются в невидимый код:
""
You can watch it here . By the way, no one bothers to obfuscate the code at least three times, at least four times.
Now the resulting code looks like this:
Already better, but something else can be done. At the beginning of the article, I outlined the task of hiding the code as if it weren’t. Now, you can simply open the console and you can immediately see what is bad. It turned out to be quite simple to eliminate this nuance: instead,
eval()
use:var script = document.createElement("script");
script.innerHTML = revealJS("");
document.getElementsByTagName('body')[0].appendChild(script);
// А потом сразу же удаляем тэг, чтобы не маячил перед глазами
document.getElementsByTagName('body')[0].removeChild(script);
Unfortunately, when executing this code on jsFiddle, the code still shows up. There is a suspicion that this is somehow related to the fact that the code in the JavaScript window also wraps in
eval()
or the same. During tests on a local project, where there are no “miracles”, everything works as it should, the function performed does not manifest itself:Areas of use
1 . For fun.
2 . Code obfuscation tool.
3 . Use in conjunction with other ways to minify / obfuscate the code. For example, the Google Closure Compiler or UglifyJS .
4 . The possibility of secretive communication in open areas.
5 . "Sleeping" scripts, "bookmarks" in articles, messages on forums, message boards, in contextual advertising, but generally wherever they give something to write and this then gets to users' browsers.
With the last two points, not everything is so simple, but I could not help but mention them. Reveal a little thought. During a quick look, it was found that, for example, Gmail and Yandex.MailDo not delete such characters. Some are transformed into a view
, but some remain invisible. I think that the situation is similar in email clients (I checked Thunderbird ) - nothing is visible. It means that a hidden message can be sent in the letter, which, during a “visual inspection”, will not give itself in any way, and even with the discovery of obscure hidden characters, only the one who has the decryption algorithm can decrypt it (which, in general, You can store it in your head and write code directly in the browser console):Привет, как дела?
But actually (you need to apply the function
revealJS()
to the part that is between the letter “a” and the question mark):Привет как дела/*, сегодня в пять, приходи один*/?
Thus, there is the possibility of covert communication in open areas. In this case, no one will suspect anything. (Unless they will purposefully search, but this is a different conversation). In general, one of the main advantages (and maybe the only one) is that the content passes “visual control” (but not everything is so smooth here: you can change the encoding and everything will be exposed). It is like an invisible person and a video surveillance system. Who knows, maybe the whole Internet has been stuffed with such messages for a long time? :)
As for the hidden scripts, then everything is obvious: if malicious js code containing a “developer” and a “launcher” gets into your browser in any way (for example, many libraries are loaded from the outside, take the same jQuery , which doesn’t hacked so long, or some common plugin, for example AdBlock ), then the code hidden on the page with a certain degree of probability will be launched. That is, in this case, the scheme is this: a lot of different "vectors", each of which is directed in its own way and one single "activator", which is very tiny.
Thanks for attention.
P.S. A little "nishtyachok": if in the script at the beginning of the line put the character U + 202E (Right-To-Left Override) in quotation marks, then it will be fun. The functionality of the code is preserved:
"";var revealJS = function(s){return s.match(/(.{4})/g).map(function(b){return b.split('').map(function(i){return Array.apply(null,{length:10}).map(Number.call,Number).concat('abcdef'.split(''))[''.split('').indexOf(i)]})}).map(function(c){return String.fromCharCode(0+"x"+c.join(''))}).join('')}