Statistics on the use of javascript libraries and CDN

    Have you ever thought about these questions:
    • How does the world feel about CDN technology for loading libraries?
    • How many successful sites are written in Wordpress?
    • What scripts do developers most often download from Google CDN?
    • How popular is jQuery?


    And here I was thinking.
    And he didn’t just think, but did a little research.
    And he wrote a small extension for chrome, which may make life better or break the Internet .
    Results inside.


    Conclusions for the lazy, or TL; DR;


    1. 10% of the 300,000 most popular sites use Wordpress.
    2. Popular sites using jQuery are switching to connecting libraries from CDN. Every year, more and more right guys.
    3. The most popular jQuery versions in the world: 1.7.x , 1.8.x , 1.9.1 , 1.10.2 .
    4. jQuery 1.7.x leads by a wide margin: every 4th connected jquery has version 1.7.1 or 1.7.2
    5. Google , jQuery and Cloudflare are the most popular CDNs.
    6. 89% of all Google CDN downloads are jquery.


    How it all began, or prelude



    I've been thinking - why do browsers not add popular js libraries to their distributions ? After all, CDN is very good, one URL for a resource, caching, all things. But it’s even better not to download static files at all, but to have them immediately in the browser.

    In response to the injustice of fate, this model of expansion of the structure was made , which is designed to accelerate the Internet.

    But you can’t just put forward a couple of hypotheses and “gash” the prototype to calm down and rest on its laurels: the brain requires evidence, facts and a fun little move ( yes, that’s how I relate to interesting research, although in the process of preparing these data there was not enough movement ).

    Why research?


    So there are a few ideas:
    • All the static from the CDN can be safely placed in the browser, as it is not modified and generally permanent .
    • If a lot of people will load statics from the browser without sending requests to the CDN server, then everyone will be fine .
    • If you locally store all common static files (read js libraries) and assume that the sites are written by good programmers who do not modify minimized files like jquery-1.7.2.min.js , then such files are permanent and apply to them. and item 2


    These ideas required confirmation. And during the implementation of the extension, I came across additional questions:
    • Is jQuery the most popular script?
    • What proportion of scripts connect to sites from CDN?
    • What versions of jQuery do people use?
    • Do minified libraries connected from their servers fall under the right pattern in the right amount?


    What are we exploring?


    Initially, I wanted to use the Common Crawl body. But in view of the fact that this beast weighs 81 Tb , and given the amount of time and money that will have to be spent on its analysis, the beast was left alone.

    A little later, I came across a wonderful article in which the author explored the Internet just on the topic that I needed.
    The problem was that I did not find the necessary answers in the article, but I found the right tools!

    Study


    For the answers I need, I used the httparchive dataset . This is a crawler data set that links sites from Alexa 's TOP 300,000 service . Those. We can say that this is a huge bunch of the most popular Internet sites.

    I downloaded the latest dataset for myself - the results of a survey of sites on March 1, 2014 .
    Below I will give the results of the study and the queries that I used to get them.
    You can compare my results with the results obtained a year earlier .

    Number of sites loading jQuery from CDN

    Hidden text
    SELECT "jquery" AS name,
    count(distinct(pageid)) AS count,
    (100*count(distinct(pageid))/290835) AS percent 
    FROM requests WHERE pageid <= 14802750 AND pageid >= 14489007
    AND url LIKE "%//ajax.googleapis.com/ajax/libs/jquery/%"
    


    Namenumber%
    jquery5997720.6223

    Every year, the number of sites that use various CDN solutions to connect jQuery is growing. This means that progress does not stand still and people are aware of the steepness of such a decision.

    The popularity of various versions of jQuery from Google CDN


    In this case, I modified the original request. My goal is to examine the share of each version of jQuery in the total number of sites that generally enable jQuery. In the articles of other authors there are maaaalenky problems that affect the visibility of the result:

    • Some sites use a “short format” version, for example //ajax.googleapis.com/ajax/libs/jquery/ 1 /jquery.min.js . To date, this format corresponds to jquery-1.9.1 . I take this into account in the totals.
    • Wordpress adds to the statics the parameter ? Ver = wpversion , which affects the grouping by url.
    • When studying the frequency of versions, it makes no difference to us which protocol is used - http or https.

    Hidden text
    select SUBSTRING(
    	url FROM POSITION("/libs/jquery/" IN url) + 13 
    	FOR 
    		LOCATE("/jquery", url, POSITION("/libs/jquery/" IN url) + 13) - (POSITION("/libs/jquery/" IN url) + 13) 
    	) as version, 
    count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/59977) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//ajax.googleapis.com/ajax/libs/jquery/%.min.js" 
    group by version order by count desc;
    


    VersionNumber of inclusions%
    1.7.2893814.9024
    1.7.1684211.4077
    1.8.356709.4536
    1.9.155339.2252
    1.10.252448.7434
    1.8.238326.3891
    1.4.236736.1240
    1.3.225194.1999
    1.5.222973.8298
    1.6.419873.3129
    1.4.419853.3096
    1.6.216442.7411
    1.6.113952.3259
    1.5.111601.9341
    1.9.09641.6073
    1.8.18801.4672
    1.10.18681.4472
    1.8.08031.3388
    2.0.35080.8470
    1.2.64490.7486
    1.7.04030.6719
    1.4.13820.6369
    1.11.03630.6052
    1.4.33570.5952
    2.0.02460.4102
    1.6.02040.3401
    1.6.31930.3218
    1.3.11120.1867
    1.5.01040.1734
    1.4.0830.1384
    1.10.0790.1317
    2.0.2740.1234
    2.1.0680.1134
    1.3.0420.0700
    2.0.1190.0317
    1.2.3thirteen0.0217

    There is an interesting trend in the jQuery world - version 1.7.x is leading from year to year by a huge margin .

    The most popular CDNs distributing js libraries.

    ParameterNumber% of all sites
    Total CDN Requests7816026.8743

    Hidden text
    select "Google"as name, count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/78160) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//ajax.googleapis.com/ajax/libs/%" 
    UNION
    select "Yandex" as name,  count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/78160) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//yandex.st/%" 
    UNION
    select "Microsoft" as name,  count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/78160) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//ajax.aspnetcdn.com/ajax/%" 
    UNION
    select "JsDelivr" as name, count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/78160) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//cdn.jsdelivr.net/%" 
    UNION
    select "Cloudflare" as name, count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/78160) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//cdnjs.cloudflare.com/ajax/libs/%" 
    UNION
    select "jQuery" as name, count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/78160) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%//code.jquery.com/%"
    group by name order by count desc;
    


    CdnCountPercent
    Google6767186.5801
    jQuery922211.7989
    Cloudflare39965.1126
    Yandex23793.0438
    Microsoft13001.6633
    Jsdelivr3240.4145

    As we can see, the lion's share of resources is connected from the Google CDN .
    Let’s now look at the Google CDN profile. It will be interesting, but the result is predictable.

    Script Download Profile from Google CDN


    Hidden text
    select "jquery" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/jquery/%"
    UNION
    select "jquerymobile" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/jquerymobile/%"
    UNION
    select "angularjs" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/angularjs/%"
    UNION
    select "chrome-frame" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/chrome-frame/%"
    UNION
    select "dojo" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/dojo/%"
    UNION
    select "ext-core" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/ext-core/%"
    UNION
    select "jqueryui" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/jqueryui/%"
    UNION
    select "mootools" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/mootools/%"
    UNION
    select "prototype" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/prototype/%"
    UNION
    select "scriptaculous" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/scriptaculous/%"
    UNION
    select "swfobject" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/swfobject/%"
    UNION
    select "webfontloader" as name,count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/67198) as percent 
    from requests WHERE pageid <= 14802750 AND pageid >= 14489007
    and url like "%//ajax.googleapis.com/ajax/libs/webfont/%"
    order by count;
    


    ScriptCountPercent
    jquery5997789.2541
    jqueryui1243718.5080
    webfontloader46246.8812
    swfobject23473.4927
    prototype9931.4777
    scriptaculous7871.1712
    mootools4450.6622
    angularjs3530.5253
    dojo1860.2768
    chrome-frame750.1116
    ext-core160.0238
    jquerymobile10.0015

    jQuery is truly the most popular script. It goes around the rest of the libraries by an order of magnitude ! ..
    Have you noticed an intriguing result? jQuery mobile is connected on only one site!
    This is not a mistake, I checked three times :)

    Estimated Wordpress Impact

    During data analysis, I noticed a robust pattern that introduces noise into the results. Namely, an incomprehensible parameter in queries to statics :? Ver = xxx .
    As it turned out, these are mostly Wordpress tricks! It adds a version parameter to statics.
    In addition, there are some more characteristic patterns - some sites add cache basting to all resources, including statics from CDN.

    Let's get back to WordPress. I found interesting patterns that allow you to introduce a simple heuristic and evaluate how common wordpress is:
    • Wordpress uses the jquery-migrate plugin . This plugin is quite rare and is used to return obsolete jQuery features from older versions in version 1.9+.
    • As mentioned above, Wordpress adds a version option to resources.

    Using this knowledge, we obtain the following.
    Hidden text
    select count(distinct(pageid)) as count,
    (100*count(distinct(pageid))/290835) as percent 
    from requests where pageid >= 14489007 and pageid <= 14802750 
    and url LIKE "%jquery-migrate%.js\\?ver=%"
    or url LIKE "%jquery-migrate%.js\\?v=%";
    


    Number of sites% of the total
    2981910.2529

    As you can see, more than 10% of the most visited sites in the world use wordpress.

    PS During the study, no site was damaged. But the extension can break something . If you still decide to use it and find such behavior - write to me in PM .
    PPS If you have interesting questions, then ask them in the comments . I will update the article and add answers.

    Also popular now: