Are “Top” Sites Safe: Exploring Alexa Rankings

    If you look at the Top-1000 Alexa ranking sites in the .ru zone, then in the first three hundred there will be Yandex, Google, Mail.ru and other major projects and companies, and then mainly entertainment, game resources, torrents and specialized ones will go. Media (and sometimes there will be domains of teaser affiliate networks).



    For sites that do not occupy the top positions from the first thousand, traffic can be on the order of 10,000 - 80,000 unique hosts per day, and sometimes higher. For hackers, such resources are attractive in that, on the one hand, sites have high traffic, and on the other, a low level of protection (usually none at all), since the owners of such sites often do not pay due attention to security issues (they don’t know, they don’t know how, do not want). For example, a resource with attendance of 85,000 unique hosts per day can run on Wordpress with vulnerable versions of plugins. It turns out that the ratio of the costs of hacking to “profit” from it is maximum for this category of sites, and this is a tidbit that at least one attacker will take advantage of. In theory, everything seems to be logical, but I would like to see this in practice, therefore, I decided to scan a large array of top resources (for example, the first 50,000 Alexa sites in the .ru zone) for hacking / infection, redirects, and other security problems. What came of it is below in the article.

    To save readers time, I’ll immediately give the results: approximately 2% of sites (to be precise, 971 sites) were “sick”. Among the problems was a complete set: redirects to download infected apk and exe files, hidden mobile redirects to wap-click affiliate programs, phishing, and even two defaces. Despite the fact that the percentage of problematic sites seems to be small, the danger turned out to be significant, since hidden redirects were found on popular sites with 60K hosts per day.

    For more effective detection of malicious code, I used several indicators of infection, since it is not always possible to determine malware only by signatures or only using behavioral analysis. For analysis, the following were taken into account:
    • redirects when opening pages with various parameters of User Agent + Referer,
    • Signatures of malicious scripts (regular expressions),
    • the presence of a domain or URL fragment in the databases of dangerous, suspicious or malicious ones,
    • the presence of the site in the “black lists” of search engines (safe browsing) and anti-virus services (based on virus total)

    To minimize the number of false positives, “white lists” of domains and URL fragments were used. For example, redirects were not taken into account when clicking from banners and banner plugins, redirects to mobile versions of sites, etc. When checking each site, a series of requests was sent to the start page, navigation was made to linked pages, on which some actions were performed, as if real visitors with various parameters (browser, referrer, platform, etc.) visited the site. After that, the result across all pages was combined and a general report was formed. And so for each of the 50,000 monitored resources. The standard phantomjs for this task did not fit because of the restrictions on some readonly DOM objects that I needed to redefine, so I had to “finish” them together with WebKit.

    An analysis of a large array of sites revealed some common infection patterns and typical ways of introducing malware into script and page code (for example, using link shortening service like goo.gl, vk.cc to hide script addresses, adding code to the beginning or end of jquery *. js files, a family of unscrupulous teaser / advertising affiliate networks, etc.).

    I will show the details in a couple of examples. The first will be a resource located at the 750th position in the ranking of Alexa sites ru. This is a fairly popular torrent. He has a naturally high attendance rate - a little over 60,000 unique visitors per day (according to the liveinternet counter).



    Let's try to open the site from a mobile: we drive the site address in the Google search bar, go to the subsection and click on the download button in the text of the page. Instead of downloading the .torrent file, we begin to redirect us to third-party sites, and in the end we are offered to download and install an .apk file called download.apk.



    To quickly check the file for "malware", upload it to Virustotal and get the expected result - SMS.Agent (i.e., SMS spy):



    If you repeat the experiment a few more times, sometimes you will be prompted to download the download accelerator, which, like you probably already guessed, except perhaps intercepting your SMS:



    In order for the malicious code to be detected as late as possible, the hacker makes every effort to disguise it: it injects the code dynamically and uses multi-stage verification in runtime. For this reason, catching a redirect without knowing all the nuances is extremely difficult with automated scanners. In this case, the redirect had the following properties:
    • the malicious injection was in the scrolltop.js script
    • the code worked out with the visitor once every 4.5 months (cookies were set)
    • the code was implemented dynamically only if the visitor came from a search engine
    • the redirect itself worked only when clicking on certain links
    • .apk download started if a visitor logged in from an android device (checked in the redirect chain for a malicious resource)


    The following is a snippet of code that the loader implemented:



    If at least one of the above conditions is not met, then the redirect will not occur. Despite the composite condition for the occurrence of a redirect, the scanner was able to calculate the malware using a characteristic signature and script injection from a suspicious domain:



    You can roughly estimate the "scale of disaster" for this site. When hacking and infecting a resource with traffic of about 50-60K, on ​​average, about 400-500 visitors come from mobile devices. On the side, a hundred visitors can download an infected .apk file.

    Consider another “top-end” example from the list of infected ones. This time it is a fitness portal, occupying approximately 450th place (that is, even more visited). This "patient" was alternately throwing it either to a paid subscription service or to a software offering android acceleration.



    At the next redirect, for the sake of curiosity, I clicked the “Continue viewing” button and then the magic happened: I was automatically signed up for paid content, which was immediately notified via SMS. No questions or confirmations that I want to do this, no additional gestures were required.



    I checked in the list of connected services in your MTS account - there really appeared a paid subscription for 20 rubles. Unsubscribed.

    Interestingly, at the time of generating the page with the “continue viewing” button, the script already knows your phone number, since SMS subscription is done with the “blessing” of the mobile operator itself, which provides the necessary interface and information about the subscriber.



    So the subscription process is not up to automated. And if you do not follow SMS notifications, you can get a few tens of rubles a day simply by visiting various portals (this is without taking into account clickjacking, where you yourself without knowing click in the right place and approve the subscription or download the malware).

    The implementation mechanics on this site are simpler: in one of the javascript files, the hacker added the following fragment:



    This code downloaded the script from the yadro24.ru domain, and that, in turn, redirected users to the wap-click portal, provided that the visitor was connected via 3G / LTE Internet (using the javascript code you can see which redirects are possible).



    If the mobile site visitor came through WI-FI, no redirects to the wap-click affiliate program occurred. Incidentally, this is a big problem for determining wap-click redirects by online services, since all requests must be sent via 3G / LTE to certain mobile operators.

    Similar infection options were found on other sites from the top of Alexa. Sometimes they embed code at the beginning of scripts, sometimes at the end.



    The examples considered are an explicit target hacking of sites and the introduction of code by hackers. But among the trusted sites, there was often another option: the voluntary placement of a widget with hidden redirects or a teaser affiliate network that redirects mobile users to dangerous sites or wap-click affiliates. At the same time, site owners are either not aware of what is happening, or, in the pursuit of high earnings, turn a blind eye to this.

    Summary statistics for various categories of infection turned out as follows:



    Curious fact: most of the resources that are on the list of “problematic” ones (those that download malicious files or redirect visitors to dangerous sites and wap-click affiliate programs) are not marked as “malware” or even “unwanted” in Yandex / Google Safe Browsing . Either the search engines do not have time to check them, or it does not detect, although the latter is unlikely. True, some sites are still removed from the search results of mobile search.

    Well, in the end, I would like to advise two things:

    1. If you are a visitor to such popular resources - do not hope that they are all safe. Even if you have an antivirus installed, it will not save you from problems such as wap-click redirects to SMS subscriptions, because it is not a virus (you are simply imperceptibly subscribed to a paid service and removed every day for 20 rubles) And, frankly, Antiviruses protect against drive-by attacks by no means in 100% of cases, especially on Android. Therefore, it is good practice to independently check such sites with specialized services if any suspicion arises.
    2. If you are the owner of such resources and have not yet thought about the site’s security, or you don’t think you’ve been hacked, it’s time to conduct a full site diagnosis: scan the files on the hosting for hacking web shells, backdoors (there are AI-BOLIT utilities for this ClamAv, Maldet), it’s even possible to do a pentest of the project, and also check the security problems with available services (for example, using rescan.pro, quttera.com, sitecheck.sucuri.net). And also, do not be lazy to switch from the search results to your website from a mobile device via 3G / LTE connection, selectively click on the site’s links, look in the code, check if there are any anomalies.


    It is worth noting that by the time this article was written, some resources had already dealt with the infection and are not threatening at the moment, but there remains a large part of the websites visited, which even after 10 days have not changed:


    Also popular now: