Maniac minimization (in pursuit of byte)

    Hello World,

    This topic is about how you can pre-refactor code so as to improve its minimization. I recently minimized the Helios Kernel library (which I wrote about the day before yesterday ) before the release . The library source code weighs 28,112 bytes, it has generous comments, and therefore it is compressed with a half-kick by the YUI compressor to 7083 bytes. Not that it seemed to me that 7 kilobytes is too bold. But just by looking at the minimized code with my own eyes, I could see a bunch of places where you could save more:



    Let's see what can be done with the code to turn 7083 bytes into 4009 3937.

    But before you begin, two caveats:
    • We will not use all sorts of "dirty" tricks (like var a = this or var f = false ), which theoretically lead to a slowdown. Performance is assumed to be more important than file size.
    • At each step, I ran the code through a set of tests. It often happens that after some change everything stops working. If during the manual optimization process you do not test the code (or if you don’t have any tests at all), then the code that will turn out in the end will most likely not work.


    Minimizer Selection


    In general, this article is not about comparing minimizers, but in the process I noticed that the YUI compressor has a feature bug : it does not remove curly braces from code blocks consisting of one line. Moreover, he adds curly braces, even if they were not in the original (marked with the WTF tag in the first picture ). I took it as rudeness and, without hesitation, switched to using the online minimizer http://jscompress.com/ . However, the rest of the arguments apply to any minimizer of your choice.

    Big Anonymous Function


    To get started, let's wrap all the code in a large anonymous function that will be called right away (if this is not initially done). Then we can use the local scope of this function. How this will save bytes will be shown below. The most compact way to wrap code in an anonymous function is as follows:
    It washas become
    // код
    
    !function(){
        // код
    }()
    

    "Private" objects


    Surely, the code has a large number of auxiliary objects that are not included in the public API. Since there is no native way in Javascript to indicate that the object is private, some kind of convention is usually used. Most often, such objects are named starting with an underscore: " _ ". Usually, the minimizer replaces the names of local variables with single-letter ones, but leaves the names of “private” objects unchanged, because it does not make bold assumptions as to how we designate “private” objects. But it doesn’t matter to us how these objects will be called in the minimized code, so you can rename them manually:
    It washas become
    myObject._somethingPrivate = {
        // ...
    }
    
    myObject.a = {
        // ...
    }
    
    MyObj = function() {
        this.somePublicProperty = ...;
        this._somePrivateProperty = ...;
        this._anotherPrivateProperty = ...;
    }
    
    MyObj = function() {
        this.somePublicProperty = ...;
        this.a = ...;
        this.b = ...;
    }
    
    MyObj.prototype._privateMethod = function() {
        // ...
    }
    
    MyObj.prototype.c = function() {
        // ...
    }
    

    Here you need to be careful. First, remember to replace the names of private functions and variables not only in declarations, but also where they are used. Secondly, you need to keep track of the logic of the code, and avoid name intersections. For example, if a function a has already been declared for some type in the prototype, the private property of this object cannot be called the same name. This is an obvious thing, but it is easy to miss if you do not pay special attention to it.

    In addition, private objects are often declared not only in all sorts of constructors / initializers. Javascript allows you to complement objects on the fly. In theory, all private identifiers in the code can be carefully replaced with single-letter ones:
    It washas become
    MyObj.prototype.getSomething = function() {
        if ( typeof this._prop == "undefined" ) {
            this._prop = 0;
        }
        return this._prop;
    }
    
    MyObj.prototype.getSomething = function() {
        if ( typeof this.x == "undefined" ) {
            this.x = 0;
        }
        return this.x;
    }
    

    "Public" objects


    “Public” objects are those that are part of the API, and we need them to be named exactly as they were originally named. But if the “public” object is used inside the code too often (well, say, at least once), and its name is too long (well, say, more than two bytes), then it makes sense to make it an alias:
    It washas become
    myObject = { ... }
    
    var a = myObject = { ... }
    

    In this example, after such a change, the variable a will be declared as local, and the variable myObject as global (provided that the identifier myObject is used for the first time.

    Now you can go over the code, find all objects that are not only declared, but also used, and make them an alias:
    It washas become
    MyObj = function() {
        this.somePublicProperty = ...;
        this.a = ...;
        this.b = ...;
    }
    
    var b = MyObj = function() {
        this.somePublicProperty = ...;
        this.a = ...;
        this.b = ...;
    }
    
    MyObj.prototype.someMethod = function() {
        // ...
    }
    
    b.prototype.d = b.prototype.someMethod = function() {
        // ...
    }
    
    someStorage.someMethod = function() {
        // ...
    }
    
    var c = someStorage.someMethod = function() {
        // ...
    }
    

    And again, the main thing is not to get confused in the scope and not to name variables from the same scope by the same name. In the examples above, an object of type MyObj already has a private property b and a private method c , and the new local variables b and c fall into the scope of the Big Anonymous Function, into which we wrapped all the code at the very beginning (we wrapped it, right? ?)

    In addition, we can do aliases to some public properties, but only to those that contain complex objects:
    It washas become
    AnotherObj = function() {
        this.someProperty = [ 0, 0, 0 ]; // массив
        this.secondProperty = { a: 1 }; // хэш
        this.thirdProperty = 0; // число
        this.fourthProperty = true; // буль-буль
        this.fifthProperty = "hello"; // строка
    }
    
    AnotherObj = function() {
        this.a = this.someProperty = [ 0, 0, 0 ];
        this.b = this.secondProperty = { a: 1 };
        this.thirdProperty = 0;
        this.fourthProperty = true;
        this.fifthProperty = "hello";
    }
    

    If we do aliases for simple objects, this will copy the contents, and the alias will point to another object.

    Putting Var 's


    Now let's take advantage of the fact that you can declare variables separated by commas, using the word var once. In the simplest case, it looks like this:
    It washas become
    someFunction = function() {
        var a = 0;
        var b = something();
        // ...
    }
    
    someFunction = function() {
        var a = 0, b = something();
        // ...
    }
    
    anotherFunction = function() {
        var c;
        // какой-то код
        var d = something();
        // ещё какой-то код
        for ( var i = 0; i < ...
        // и ещё какой-нибудь код
    }
    
    anotherFunction = function() {
        var c, d = something(), i = 0
        // какой-то код
        // ещё какой-то код
        for ( ; i < ...
        // и ещё какой-нибудь код
    }
    

    In general, you need to pull all the declarations to the top of the function and write them using one var . I’ll write about optimizing the for () loop below. And you also need to collect all the local declarations inside our Large Hadron Functions and also put them under one var at the beginning. These are exactly the aliases that we created in the previous section. All code should convert like this:
    It washas become
    !function(){
        // какой-то код
        var b = MyObj = function() {
            this.somePublicProperty = ...;
            this.a = ...;
            this.b = ...;
        }
        // ещё какой-то код
        var c = b.prototype.someMethod = function() {
            // ...
        }
        // и ещё какой-нибудь код
    }()
    
    !function(){
        var  b = MyObj = function() {
            this.somePublicProperty = ...;
            this.a = ...;
            this.b = ...;
        },
        c = b.prototype.someMethod = function() {
            // ...
        },
        // и так со всеми алиасами
        // какой-то код
        // ещё какой-то код
        // и ещё какой-нибудь код
    }()
    

    Note that in this example, the variables b , c and the like remain declared local to the Big Function. This way we will save as many vars as there were in the function (well, except for one).

    And still it is necessary to watch that code logic has not changed. We are changing the order of the lines, so theoretically it can happen that some object will be used before it is initialized, this cannot be allowed.

    Prototypes


    For each declared type and its constructor, you can save a lot of money on the word protoype - it's too long. To do this, we describe the entire prototype for future objects of this type in the form of a single hash:
    It washas become
    MyObj = function() {
        // ...
    }
    MyObj.prototype.someMethod = function() {
        // ...
    }
    MyObj.prototype.anotherMethod = function() {
        // ...
    }
    MyObj.prototype.thirdMethod = function() {
        // ...
    }
    
    MyObj = function() {
        // ...
    }
    MyObj.prototype = {
        someMethod : function() {
            // ...
        },
        anotherMethod : function() {
            // ...
        },
        thirdMethod : function() {
            // ...
        }
    }
    

    As you can see, for this you need to remember to replace " = " with " : " and separate method declarations with commas. This method will not work for the case when you need to supplement some prototype for a type constructor declared somewhere else (because we completely redefine the prototype with such a record).

    Optimization of cycles and conditions


    Almost all cycles and many conditions can be optimized:
    It washas become
    a--;
    if ( a == 0 ) {
        // ...
    }
    
    if ( --a == 0 ) {
        // ...
    }
    
    if ( --a == 0 ) {
        // ...
    }
    
    if ( !--a ) {
        // ...
    }
    
    for ( var i = 0; i < a; i++ ) {
        b( c[ i ] );
    }
    
    for ( var i = 0; i < a; ) {
        b( c[ i++ ] );
    }
    

    But here, too, you need to be careful so as not to violate the logic of the code.

    Frequently Used Values


    It happens that there are values ​​that are used more than once. They can also be put into variables:
    It washas become
    // ...
    if ( typeof a == "undefined" ) ...
    // ...
    if ( typeof b == "undefined" ) ...
    // ...
    
    var z = "undefined";
    // ...
    if ( typeof a == z ) ...
    // ...
    if ( typeof b == z ) ...
    // ...
    
    if ( typeof a != "function" ) {
        a = function(){}
    }
    // ...
    if ( typeof b != "function" ) {
        b = function(){}
    }
    
    var f = "function", g = function(){}
    // ...
    if ( typeof a != f ) {
        a = g;
    }
    // ...
    if ( typeof b != f ) {
        b = g;
    }
    
    el = document.createElement( "script" );
    el.type = "text/javascript";
    
    var x = "script";
    el = document.createElement( x );
    el.type = "text/java" + x;
    


    Throw away all unnecessary


    It often happens that the code contains redundant information "for clarity", which you can get rid of. But here, as elsewhere, you need to carefully monitor what we remove:
    It washas become
    if ( a.length > 0 ) {
        b = a.pop()
    }
    
    if ( a.length ) {
        b = a.pop()
    }
    
    var someEnum = { foo : 0, bar : 1, buz : 2 }
    // ...
    var a = [];
    for ( var i in someEnum ) {
        a[ someEnum[ i ] ] = 0;
    }
    // ...
    a[ someEnum.bar ] = getSomething();
    // ...
    if ( c.state == someEnum.foo ) {
        // ...
    }
    
    var someEnum = { foo : 0, bar : 1, buz : 2 }
    // ...
    var a = [ 0, 0, 0 ];
    // ...
    a[ 1 ] = getSomething();
    // ...
    if ( !c.state ) {
        // ...
    }
    

    Bonus: remove var 's


    This is an interesting trick, which is useful in cases where one local variable is declared inside the function (or if the variable is declared without initialization). Here we save on one var , but we have to duplicate the variable name:
    It washas become
    doSomething = function( param1, param2 ) {
        var i = 0;
        // ....
    }
    
    doSomething = function( param1, param2, i ) {
        i = 0;
        // ....
    }
    
    doSomething = function( param1, param2 ) {
        var a, b, c;
        // ....
    }
    
    doSomething = function( param1, param2, a, b, c ) {
        // ....
    }
    

    Here we use parameters instead of local variables, but they behave exactly the same. This trick is not suitable in those cases when the function takes an unknown number of parameters in advance. Most often, it allows you to get rid of almost all var 's in the code.

    What happened as a result


    After processing the code in the described ways, I fed the script to jscompress.com . After a little thought, he gave me such a mess for 4009 bytes. Bon Appetit!



    By the way, I will give out pluses to karma to those who find and describe in the comments what else you can trim in this mess :-)

    Update

    nano_freelancer suggested some good ideas:
    • replace all true and false with 1 and 0, respectively
    • for (initial;condition;loop statement) {statements}
      you can put a comma after the loop statement and put all the statements from statements separated by a comma (instead of a semicolon) - save 2 bytes (curly braces). But this is only applicable for cases where statement itself does not contain complex operators.

    In addition, most nulls can also be replaced with 0 (but not all).

    Code size reduced to 3937 bytes :-)

    Offtopic: source and minimized codes that I worked with are available for download on the project home page: http://home.gna.org/helios/kernel/

    Also popular now: