Why can't JavaScript security analysis be truly automated?

    Why in the case of JavaScript do I have to do with simple static analysis approaches when there are more interesting approaches to automatic code analysis?

    In response to this question, my colleague Alexey Goncharov kukumumu answered succinctly: “JavaScript is punk language” and threw a link to Jasper Cashmore 's article “A Javascript journey with only six characters” , which really immerses us on a journey into the esoteric world of JSFuck and everything at once puts in its place.
    I liked it so much that I decided to translate the article into Russian.


    Translation of “A Javascript journey with only six characters”

    Javascript is a weird and wonderful language that allows you to write crazy code that works anyway. He is trying to help us by transforming data into specific types based on how we handle them.

    If we add plus or minus signs in front of something, JS will assume that we want to add text and convert the data type to String.
    JS will decide what we mean by the number, and the data will be converted to the type Number(if possible).

    If we deny any data, it will translate into Boolean.
    We can use JS to do all sorts of magical things using only symbols .[,],(,),! и +

    If you are not reading this from a mobile device, you can open the JS console to track the story and verify the functionality of the examples just by copying them.

    Let's start with the basics. A few golden rules that you need to remember:

    Starting with !get the type Boolean
    Starting with +get the type Number
    Adding [] get the type String

    These are the rules in action:

    ![] === false
    +[] === 0
    []+[] === ""

    Another thing that is important to know is that you can return certain characters from strings using brackets, like this:

    "hello"[0] === "h"

    In addition, remember that you can get numbers by adding their components in the form of strings, and then translating the result into the Number type using rule No. 2:

    +("1" + "1") === 11

    Like this. Now let's combine all of the above to get a character a.

    ![] === false
    ![]+[] === "false"
    +!![] === 1
    (![]+[])[+!![]] === "a" // same as "false"[1]


    It turns out that with relatively simple combinations, we can get any of the letters that make up the words true and false. a, e, f, l, r, s, t, u. How can we get the rest of the letters?

    Well, there is, for example, undefinedwhich we can get by writing nonsense type [][[]]. We translate into a type Stringusing one of our Golden Rules, and in addition we get the letters d, iand n.

    [][[]] + [] === "undefined"

    Of all the letters that we already have, we can get words like fill, filterand find . Of course, you can get others, but these are noteworthy in that they are array methods . This means that they are objects Array and can be called directly in the arrays, for example [2,1].sort().

    Another important thing to know about JS is that the properties of an object can be accessed using point or bracket notation . Since the array methods mentioned above are properties of the array itself, we can call these methods using square brackets instead of dot notation.

    So it turns out that the [2,1]["sort"]()same thing [2,1].sort().

    Let's go ahead and see what happens if we try to use one of our array methods, written using the current collection of letters, without calling it.


    It turns out function fill() { [native code] }. We can turn this method into a string using our golden rule:

    []["fill"]+[] === "function fill() { [native code] }"

    That's how we get the following characters: c, o, v, (, ), {, [, ], }, .

    With the newfound cand owe can form a word constructor. constructor this is the method that all JS objects have, returning their constructor function.

    Let's get, as a string, a representation of the constructor functions for the objects we have dealt with so far:

    true["constructor"] + [] === "function Boolean() { [native code] }"  
    0["constructor"] + []    === "function Number() { [native code] }"  
    ""["constructor"] + []   === "function String() { [native code] }"
    []["constructor"] + []   === "function Array() { [native code] }"

    So we'll add to our arsenal of the following characters: B, N, S, A, m, g, y.

    Now we can create " toString", a function that can be used with square brackets.

    (10)["toString"]() === "10"

    But we can turn anything into a string using our golden rule, so how can this be useful?

    What if I tell you that a toString type method Number has a secret argument, a secret argument called radix that can change the base of the number system of a given number before being translated into a string? Take a look:

    (12)["toString"](10) === "12" // base 10 - normal to us
    (12)["toString"](2) === "1100" // base 2, or binary, for 12
    (12)["toString"](8) === "14" // base 8 (octonary) for 12
    (12)["toString"](16) === "c" // hex for 12

    But why stop at 16? The maximum is 36, which essentially gives us all the characters from 0-9and a-z. So that we can call any number or letter:

    (10)["toString"](36) === "a"
    (35)["toString"](36) === "z"

    Wonderful! But what about other characters such as capital letters and punctuation? Digging deeper.

    Depending on where your code runs, you may have access to predefined objects or data. If you run the code in a browser, chances are good that you have access to some HTML wrapper methods .

    For example, bold is a method String that adds tags <b>.

    "test"["bold"]() === "test"

    It gives us symbols <, >and /.

    It converts the string to a URI-compatible format that simple browsers are able to digest. This feature is an important part of our quest, so we need to access it. We can write it, but can we fulfill it? This is not a typical function, like all the previous ones, but a global level function.

    What is a function constructor?

    The answer is function Function() { [native code] }, the Function object itself is the constructor.

    []["fill"]["constructor"] === Function

    Using this, we can pass a line of code to create a function.


    It turns out:

    Function anonymous() {  

    We can call this code already, just using it ()at the end.
    So now we use the escape function as follows:

    []["fill"]["constructor"]("return escape(' ')")() === "%20"

    If we pass our <escape function described earlier, we get %3C. This capitalization is Cvery important to get the rest of the characters that we are missing.

    []["fill"]["constructor"]("return escape('<')")()[2] === "C"

    Using it, we can write to write a function fromCharCodethat returns Unicode characters from a given decimal representation. This is the part of objects Stringthat we can get in the same way as we did before.

    ""["constructor"]["fromCharCode"](65) === "A"
    ""["constructor"]["fromCharCode"](46) === "."

    We can check any decimal representations of Unicode characters here: Unicode lookup .

    Fuh. Look like that's it!

    Now we have the opportunity to call almost any character in the world, compose code from them, and even execute. This means that we get the Turing completeness in Javascript is, using a total of six characters: [, ], (, ), +and !.

    Want proof? Run this code in your browser:

    Reveal code

    If you are reading this from a mobile, the code above is alert (“wtf”).

    There is even a tool called JSFuck that automates the conversion, and here you can see how it translates each character.

    What is the practical use?

    None. No way! True, recently eBay has done some bad things , thanks to which sellers can now embed JS code in their pages using only these symbols, but this is a rather unusual attack vector. Some more people will remember about obfuscation, but, to be honest, there are better ways to obfuscate this.


    I hope you enjoyed the trip!

    Thanks to VladimirKochetkov and Ksenia Kirillov for help with the translation.

    Also popular now: