baevhd September 19, 2014 at 17:14

Identification of search engine ranking algorithms

From the sandbox

Food for brain

When you work in the field of SEO for some time, sooner or later involuntarily visit thoughts about what formulas search engines use to put a site in search results higher or lower. Everyone knows that this is all kept in the utmost secrecy, and we, the optimizers, know only what is written in the recommendations for webmasters and on some resources dedicated to website promotion. Now think for a second: what if you had a tool that reliably, with an accuracy of 80-95%, showed what exactly needs to be done on the page of your site, or on the site as a whole, so that for a specific request your site was in the first position in the SERP, or on the fifth, or simply on the first page. Not only that, if this tool could determine with the same accuracy which position you will get, if you perform certain actions. And as soon as the search engine introduced changes to its formula, changed the importance of one or another factor, one could immediately see what exactly was changed in the formula. And this is only a small fraction of the information that you could get from such a tool.

So, this is not an advertisement of the next promotion service, and this is not the provision of a specific formula for ranking sites by search engines. I want to share my theory, for the implementation of which I have neither the means, nor the time, nor the sufficient knowledge of programming and mathematics. But I know for sure that even for those who have all this, it may take not even 1 month to implement this, perhaps 1-1.5 years.

Theory

So, the theory consists in finding out which factor affects positions more or less than another factor by poking a finger into the sky. It’s very difficult to tell everything on the fingers, so I had to make a table that more or less displays what I want to convey.

Have you looked at the table? Now to the point. We take any key phrase, it doesn’t matter which one, enter it into the search engine and take the first 10 sites from the search results, these will be our experimental ones. Now we need to do the following: write code that will randomly change the significance of factors ( PFin the table) ranking until our program will arrange the sites in such a way that they would exactly match the search engine results. That is, we must repeat the search engine ranking algorithm by typing. The significance of the factors themselves we can define only as positive neutral or negative.

Now in order about the table and factors. Conventionally, we assign a value from 1 to 800 (approximately) to each factor. Since it is reliably known that Yandex, for example, has ranking factors somewhere close to this number. Roughly speaking, our maximum number will be as many ranking factors as we know for sure. Two factors cannot have the same number, that is, each factor has a unique value. There is a separate column for each factor in the table, and there are a lot of them, physically I will not be able to place everything in one picture.

Now the question is, how to calculate page rank? It’s very simple: for starters, simple mathematics, if the factor has a positive effect, we add the factor rank to the page rank, if it is negative, then we add 0. You can complicate this, make 3 options and add, for example, subtracting the factor rank from the page rank, if this factor is critical , for example, gross spamming of a key phrase.

We get something like this algorithm for calculating page rank. Take it for ( PR ), and take the factor as ( F ) and then:

PR = Take the first factor. If F1 is positive, then do PR + F1, if F1 is negative then do PR - F1, if F1 is neutral, then do nothing, after that we also check F2, F3, F4 and so on until the factors run out.
A selection should be made in such a way that each factor would try each rank value. That is, to try every factor in each value.

The whole difficulty is to take into account all the influencing factors, up to the amount of text on the page and the TIC of the site on which the link to our experimental page is located, and the difficulty is not even in accounting for this information, but in collecting it. Because manually collecting all this information is unrealistic, you need to write all kinds of parsers so that our program will collect all this data automatically.

The work is very large and complex and requires a certain level of knowledge, but just imagine what opportunities it will open after implementation. I will not go into all the subtleties of calculations and the influence of factors, I do not like a lot of scribble, it is easier for me to explain directly to a person.

Now some will say that there will be a lot of coincidences in different variations. Yes, it will, but if you take not the first page, but, for example, the first 50 pages? How many times will the chance of a slip be reduced?

There is still a difficulty in that we simply have nowhere to take some factors, for example, we will not be able to take into account behavioral factors. Even if all sites from the SERP will be under our control, we will not be able to do this, because most likely it takes into account exactly how the user behaves on the SERP, from here appears the second unknown in our equation, in addition to the position itself.

What will such software give us after implementation? No, he won’t give the exact search engine formula, but he will definitely show which of the factors influences the ranking more strongly and which is not significant at all. And when promoting, we will be able to substitute the page of our site with our parameters in this formula, and even before we begin its promotion, we will see what position the page will be on a specific request after the search engine considers all the changes.

In general, this is a very complex topic, and very useful information for the mind, because it makes you think, for example, is the power of one computer enough for such calculations? And if it’s enough, how much time will it take for example? If you do not satisfy the result, then the formula can somehow be complicated, changed, until there is a 100% accurate result on 100 pages of results. Moreover, for the purity of the experiment, you can connect about 100 different sites and implement a non-existent key phrase on them, and then use the same key phrase to track the algorithm. A lot of options. Need to work.

Tags:

Identification of search engine ranking algorithms

Food for brain

Theory

Also popular now: