How black SEO optimizers collect millions of visitors on highly relevant queries in Yandex

Published on November 09, 2017

How black SEO optimizers collect millions of visitors on highly relevant queries in Yandex

    It seemed to me that search engines had long defeated black hat tactics using machine learning and other powerful technologies. Doorway networks, if they stayed, only somewhere on the sidelines of the Internet, in marginal topics such as casinos or adult content.

    But recently, I immediately stumbled upon a whole bunch of spam sites that collect millions of visitors from Yandex, easily win quality and reputable projects, even in white niches.

    image

    For queries for which the relevance of the information is very important, Yandex mixes the latest documents into the usual search results. This sounds logical, not all sites fall into Yandex News, a blogger’s latest article about an accident in Penza may be a better answer to a user’s question than old news on a reputable website.

    But there are two strange points:

    • Such answers appear for rather unexpected requests for which relevance is clearly not measured for hours or days. For example, “a recipe for pancakes on kefir” or “homemade pasties”.
    • Yandex uses algorithms for ranking, which are significantly different from the algorithms of the main issue. For example, it is ignored that the content is non-unique or generated.

    Special signs


    The first positions for such requests are usually given to pages that have been published in the last few hours. In addition to the age mark of the document to the right of the snippet, these pages are distinguished by the presence in the URL of the saved copy of the src = FT parameter. For example,
    http://hghltd.yandex.net/yandbtm?fmode=inject&url=https%3A%2F%2Fzakupka.tv%2Frecipe%2Fchebureki-7764&tld=ru&la=1510220416&tm=1510221945&text=%D0%B4%D0%BE%D0%BC%D0%B0%D1%88%D0%BD%D0%B8%D0%B5%20%D1%87%D0%B5%D0%B1%D1%83%D1%80%D0%B5%D0%BA%D0%B8&l10n=ru&isu=1&dsn=0&sg=vla1-0074.search.yandex.net%3A7301&sh=-1&d=4900&src=FT&mime=html&sign=287713794a48239813318f67a221cb09&keyno=0

    Obsolete, these documents go down in the issuance below, are mixed with the main issuance, many fall out completely.

    If you use Serpstat or Advodka to look at the results for other queries that rank the sites you find, you will see dozens of such projects. They specialize in receiving pseudo-news traffic, the monthly attendance of some of them reaches tens of millions of visits.

    Examples


    We’ll analyze a few pages in the top 5 for “homemade pasties” (see screenshot at the beginning of the post). To determine whether the texts are really new and relevant, we will search in Yandex and Google for the quoted pieces of these texts. This will help us find documents with the exact occurrence of the desired piece of text.

    No duplicates were found on the first site, but the second site
    lady-day .ru / chebureki-retsept-myaso-ochen-udachnoe-testo / immediately raised questions.

    On the page liveinternet .ru / users / 5168383 / post329973643 / this article was copied back in 2014, Google last indexed the article on November 4, in the cacheon the page itself it is indicated that the article was published on November 4, 2017. In the current version, the publication date is November 9, 2017. The site clearly repeatedly re-published the article to manipulate the issuance of Yandex.

    The next site is ladiesvenue .ru / chebureki-s-myasom-recept-krymskij-ochen-udachnoe-xrustkoe-testo /. The cache Yandex has the same text on the same site, but published 4 days ago, this is indicated in the url cache ladiesvenue .ru / 05-11-2017-sochnye-chebureki -recept-klassicheskij-samyj-vkusnyj-s-foto / . Moreover, this page is also in the search results for “homemade pasties”. For some reason, Yandex cannot determine the duplicate even within the same site. On the quoted piece of text there are several more similar sites at once.

    The next is poleznue-soveti .ru / chebureki-s-myasom-udacshnoe-testo.html. Using the quoted piece of text, Google finds a full copy of this article, but on another site, indexed 11 days ago. Yandex also indexed this page, but still believes that a fresh duplicate is more relevant than other sites.

    With mywomenblog .ru / chebureki-s-myasom-recept-ochen-udachnoe-xrustkoe-testo-36187 / a similar situation, there is a cached text of another site, also indexed 11 days ago.

    These sites post their and someone else’s previously published content under the new dates, compile a new article from several other articles. But on other requests there are also completely pathological situations - pages with generated meaningless text, for example, such:
    healtherbal .ru / news / klassicheskaya-vozdushnaya-sharlotka-s-yablokami-b-retsept-bs-foto-vsyo-chto-izvestno.html
    jurnal24 .ru / vkusnaya-sharlotka-s-yablokami-prostoj-recept-vsyo-chsy -izvestno-na-dannyj-moment /

    image

    How do they do it?


    I could not find duplicate signs in the layout of such sites. Some use only micro-marking, some just explicitly indicate the date of publication, some combine both methods.

    Could not find confirmation that Yandex displays these pages based on links from other sites, most pages do not.

    The only regularity besides the current date is mainly sites that are engaged in the extraction of only such traffic. Perhaps the presence of a large number of pages relevant to pseudo-news queries is a positive signal for Yandex.

    It seems that it’s quite simple to collect the appropriate queries, select relevant articles from other projects for them and publish them from several sites under different URLs, indicating the current time and date of publication. Perhaps one text can be published a limited number of times, I have not seen so many copies. They were mostly found on Google, not Yandex. Most likely, to maximize the result, sites publish them at the optimal time before the peaks of daily traffic in the selected niche.

    For a number of requests, these sites manage to deceive Yandex News by issuing recipes for news:

    image

    I remembered that back in March, a friend of mine was telling me that the pages with the current publication date were flooding the recipes, but did not attach any importance to this. Judging by the traffic trends of the sites I met, the problem exists for at least several years.

    Last week I sent a search spam complaint, I hope that Yandex employees will pay attention to it.