# Assessment of the variability of search results

Once, on a quiet summer night, while solving urgent analytical problems, the question arose of how to measure the degree of variability of search results. In search of an answer, one single study on this subject was discovered - Koksharov, 2012 .

But I did not receive satisfaction, there were even more questions. Using the algorithms of Oliver and Levenshtein just because the corresponding functions are in PHP seemed unreasonable. And the justification of the methods based on the difference in positions is unconvincing.

Why so, and not that way? Why an array or string, rather than an ordered set or tuple? What can the assumptions lead to? And finally, is there one single best, most correct, most “final” way?

As a result, I had to invent my own bike - that is, put everything on the shelves, at least for myself. But still, with the hope that it will be interesting not only to me.

The search for a ready-made mathematical apparatus also yielded nothing. Ordered set? Line? Array? ... It's not that. The closest is a tuple / vector, but the distance measures used there do not reflect the essence of the rating. Either I don’t know something, or too many years have passed since student days. I hope that those who practice math more often will correct me or at least come across an idea in which direction to look. In the meantime, we will try to introduce our own definitions, remaining in terms of the subject area.

To mark everyone’s favorite Top3, Top10, Top100, etc. we introduce the concept of “rating N” as an ordered sequence of length containing identifiers of ranked objects

, (1)

where by object identifier we mean a link (URL) to a ranked document.

The simplest and most natural assumption is that the measure of variability should be somehow related to the change in the position of objects in the ratings. The greater the difference (distance) between the new and old position of a particular object and the more objects that have changed their position, the greater should be the difference between the two ratings.

In this setting, the distance between the two ratings will be called the sum of the differences in the positions of all objects included in the ratings. Let us try to express this definition more formally.

Let two ratings be given and . Elements of these ratings may coincide fully or partially, or may not coincide completely.

Then let- a lot of objects included in both compared ratings. The power of this set (the number of elements included in it) will vary from (in the case when the objects in both ratings completely coincide and the difference between the ratings consists only in their permutation) to (in the case when the elements of the two ratings are completely different).

One and the same object can be in ratings both at different positions, and at coinciding ones. Or maybe not at all in one of the ratings.

We will call the position of the -th object in the rating , and - the position of

(2)

Summing up the differences in positions for each element of the set, we obtain the following expression for the distance between two ratings in absolute terms:

(3)

There is no problem calculating this distance when objects are present in both ratings. But what to do when one of the ratings does not contain objects of another rating, that is, are located outside it? In this case, it seems very reasonable to take the value for the position of the absent object as the closest position outside the rating.

It is clear that in real life a site can fly out, for example, from Top10 much further than 11th place. And it is possible to increase the accuracy of evaluating the variability of search results by considering ratings of longer lengths - 30, 50, 100, 1000. It is very likely that for large this assumption will play an increasingly smaller role. In the meantime, the question of choosing the optimal rating length remains open and we have to be satisfied with the assertion that the variability estimates obtained with this assumption will be the estimates of the minimum difference in the sense that the distance between the ratings will be no less than the obtained rating.

Estimates of the absolute difference between ratings are difficult to interpret and compare. For the convenience of operating estimates, they should be reduced to relative form. As a standard value, we need to find the maximum possible distance between the ratings. It is clear that it will correspond to the case when the ratings completely differ in the composition of the elements. That is, all objects of the rating turned out to be beyond its limits, and all the objects of the rating appeared from outside its limits. That is, each rating object has moved from its position to position , and each rating object , on the contrary, has moved from position to position.

Then for the rating the maximum possible sum of distances will be:

That is, we got the sum of the arithmetic progression with the first member , step -1, and the last member 1.

Accordingly, for the second rating, when each of its objects moved from position to position, we get a similar arithmetic progression with the first element 1, step 1 and the last element , the amount of which will be determined by the same expression.

As a result, we obtain that the total distance over which the objects of the first and second rating moved will be determined by the expression

(4)

Therefore, for a relative assessment of the rating variability, we obtain the following expression

(5)

Those who wish can deal with this in more detail with a small example.

An attentive reader may notice that the estimates of the degree of change in the rating obtained from expressions (3) or (5) are weakly sensitive to local changes in general and to transpositions in particular. (Transposition is when two elements simply swap places). If the first two or the last two elements are interchanged, we get the same difference. For example, the transposition of 1st and 2nd places or 4th and 5th gives the same difference .

Perhaps from the point of view of the search engine and its ranking function, such changes are really insignificant. But I, as a practicing marketer, are primarily interested in the consequences for ward sites. But these consequences, even in the case of local changes, can be very significant. And this is primarily due to the fact that the clickability of search results strongly depends on the place in the ranking (in serp) and, therefore, the organic traffic received by sites located in the area of local changes is changing quite significantly (at times).

Thus, it would be desirable to take into account the fact that the difference between the 1st and 2nd place in the search results is much larger than the difference between the 4th and 5th. To do this, we need to introduce a weight function for places in the ranking. And the best such function, reflecting the change in search traffic, will be the dependence of the clickability of search results on the position taken.

In general, the choice of a “good” approximating function for serp click statistics is a topic for a separate study. Ideally, it depends on a very large number of parameters: search engine, type of keyword, snippet quality, site composition, finally. But for our purposes, when we are interested not so much in absolute as in relative (difference in places) estimates, practically any of the known ones can be used. I’m more used to using the following dependence given in Samuilov, 2014 , which demonstrates fairly good approximating capabilities

, (6)

where is the position in the rating,- a parameter depending on the search engine and the following values: . The average value across all search engines .

Taking into account (6), the distance between the positions of the ith object will take the form

(7)

And the absolute distance between the ratings will accordingly be

(8) The

maximum possible weighted distance between the ratings will be determined by the expression

(9)

Then the weighted relative distance will be determined by the expression

(10)

It should be noted that in the end, the weighted relative distance does not depend on the parameter , that is, on the search engine.

For the above example, the weighted distance is 61%. That is, it is more sensitive to replacing the leader of the rating.

Well, it is much more sensitive to local changes: transposition 1-2 in the Top5 rating will have a value of 34%, and transposition 4-5 - a value of 3.4%.

The measures obtained can be used for various problems of analysis of fluctuations in search results. These tasks determine specific profiles for analysis: the composition of search queries (by type, subject, length, frequency), search area (by region, web / news / illustrations / blogs), etc.

How to determine which method of analyzing search engine variation is "the most final"? You can call your methods “correct” updates, “accurate”, “most accurate” ... But no matter how much you say “halva” - it won’t become sweeter in your mouth.

The only option is a comparative analysis of various methods in historical samples and an assessment of their sensitivity to already known facts of changing search engine ranking functions. Unfortunately, I do not have such statistics. But I would be glad to work with those who have it.

But I did not receive satisfaction, there were even more questions. Using the algorithms of Oliver and Levenshtein just because the corresponding functions are in PHP seemed unreasonable. And the justification of the methods based on the difference in positions is unconvincing.

Why so, and not that way? Why an array or string, rather than an ordered set or tuple? What can the assumptions lead to? And finally, is there one single best, most correct, most “final” way?

As a result, I had to invent my own bike - that is, put everything on the shelves, at least for myself. But still, with the hope that it will be interesting not only to me.

#### Measure of rating variability

The search for a ready-made mathematical apparatus also yielded nothing. Ordered set? Line? Array? ... It's not that. The closest is a tuple / vector, but the distance measures used there do not reflect the essence of the rating. Either I don’t know something, or too many years have passed since student days. I hope that those who practice math more often will correct me or at least come across an idea in which direction to look. In the meantime, we will try to introduce our own definitions, remaining in terms of the subject area.

To mark everyone’s favorite Top3, Top10, Top100, etc. we introduce the concept of “rating N” as an ordered sequence of length containing identifiers of ranked objects

, (1)

where by object identifier we mean a link (URL) to a ranked document.

The simplest and most natural assumption is that the measure of variability should be somehow related to the change in the position of objects in the ratings. The greater the difference (distance) between the new and old position of a particular object and the more objects that have changed their position, the greater should be the difference between the two ratings.

In this setting, the distance between the two ratings will be called the sum of the differences in the positions of all objects included in the ratings. Let us try to express this definition more formally.

Let two ratings be given and . Elements of these ratings may coincide fully or partially, or may not coincide completely.

Then let- a lot of objects included in both compared ratings. The power of this set (the number of elements included in it) will vary from (in the case when the objects in both ratings completely coincide and the difference between the ratings consists only in their permutation) to (in the case when the elements of the two ratings are completely different).

One and the same object can be in ratings both at different positions, and at coinciding ones. Or maybe not at all in one of the ratings.

We will call the position of the -th object in the rating , and - the position of

**the same**object in the rating . Then the distance between the positions of the ith object will be the modulus of their difference(2)

Summing up the differences in positions for each element of the set, we obtain the following expression for the distance between two ratings in absolute terms:

(3)

There is no problem calculating this distance when objects are present in both ratings. But what to do when one of the ratings does not contain objects of another rating, that is, are located outside it? In this case, it seems very reasonable to take the value for the position of the absent object as the closest position outside the rating.

It is clear that in real life a site can fly out, for example, from Top10 much further than 11th place. And it is possible to increase the accuracy of evaluating the variability of search results by considering ratings of longer lengths - 30, 50, 100, 1000. It is very likely that for large this assumption will play an increasingly smaller role. In the meantime, the question of choosing the optimal rating length remains open and we have to be satisfied with the assertion that the variability estimates obtained with this assumption will be the estimates of the minimum difference in the sense that the distance between the ratings will be no less than the obtained rating.

Estimates of the absolute difference between ratings are difficult to interpret and compare. For the convenience of operating estimates, they should be reduced to relative form. As a standard value, we need to find the maximum possible distance between the ratings. It is clear that it will correspond to the case when the ratings completely differ in the composition of the elements. That is, all objects of the rating turned out to be beyond its limits, and all the objects of the rating appeared from outside its limits. That is, each rating object has moved from its position to position , and each rating object , on the contrary, has moved from position to position.

Then for the rating the maximum possible sum of distances will be:

That is, we got the sum of the arithmetic progression with the first member , step -1, and the last member 1.

Accordingly, for the second rating, when each of its objects moved from position to position, we get a similar arithmetic progression with the first element 1, step 1 and the last element , the amount of which will be determined by the same expression.

As a result, we obtain that the total distance over which the objects of the first and second rating moved will be determined by the expression

(4)

Therefore, for a relative assessment of the rating variability, we obtain the following expression

(5)

Those who wish can deal with this in more detail with a small example.

**Example for Top5**

Let , , and .

Then

From here the absolute distance between the ratings will be The

maximum possible distance will be .

So, we get the following relative distance or 40%

Then

From here the absolute distance between the ratings will be The

maximum possible distance will be .

So, we get the following relative distance or 40%

#### Weighted measure of rating variability

An attentive reader may notice that the estimates of the degree of change in the rating obtained from expressions (3) or (5) are weakly sensitive to local changes in general and to transpositions in particular. (Transposition is when two elements simply swap places). If the first two or the last two elements are interchanged, we get the same difference. For example, the transposition of 1st and 2nd places or 4th and 5th gives the same difference .

Perhaps from the point of view of the search engine and its ranking function, such changes are really insignificant. But I, as a practicing marketer, are primarily interested in the consequences for ward sites. But these consequences, even in the case of local changes, can be very significant. And this is primarily due to the fact that the clickability of search results strongly depends on the place in the ranking (in serp) and, therefore, the organic traffic received by sites located in the area of local changes is changing quite significantly (at times).

Thus, it would be desirable to take into account the fact that the difference between the 1st and 2nd place in the search results is much larger than the difference between the 4th and 5th. To do this, we need to introduce a weight function for places in the ranking. And the best such function, reflecting the change in search traffic, will be the dependence of the clickability of search results on the position taken.

In general, the choice of a “good” approximating function for serp click statistics is a topic for a separate study. Ideally, it depends on a very large number of parameters: search engine, type of keyword, snippet quality, site composition, finally. But for our purposes, when we are interested not so much in absolute as in relative (difference in places) estimates, practically any of the known ones can be used. I’m more used to using the following dependence given in Samuilov, 2014 , which demonstrates fairly good approximating capabilities

, (6)

where is the position in the rating,- a parameter depending on the search engine and the following values: . The average value across all search engines .

Taking into account (6), the distance between the positions of the ith object will take the form

(7)

And the absolute distance between the ratings will accordingly be

(8) The

maximum possible weighted distance between the ratings will be determined by the expression

(9)

Then the weighted relative distance will be determined by the expression

(10)

It should be noted that in the end, the weighted relative distance does not depend on the parameter , that is, on the search engine.

For the above example, the weighted distance is 61%. That is, it is more sensitive to replacing the leader of the rating.

Well, it is much more sensitive to local changes: transposition 1-2 in the Top5 rating will have a value of 34%, and transposition 4-5 - a value of 3.4%.

#### Variability of profile ratings

The measures obtained can be used for various problems of analysis of fluctuations in search results. These tasks determine specific profiles for analysis: the composition of search queries (by type, subject, length, frequency), search area (by region, web / news / illustrations / blogs), etc.

**Analysis of search engine updates**. This has already become the classic task of analyzing search engine variation. The more representative the set of keywords, the better will be the assessment of global changes in the algorithm / ranking base.**Reputation Management Tasks**. As a set of keywords, brand queries related to your company / product are used here. By analyzing fluctuations in the news feed, you can determine increased activity in the profile you are interested in.**Niche Competition Analysis**. The increased variety of search results for thematic queries can be interpreted as an indicator of low competition when unequivocal leaders have not yet been determined.#### In conclusion

How to determine which method of analyzing search engine variation is "the most final"? You can call your methods “correct” updates, “accurate”, “most accurate” ... But no matter how much you say “halva” - it won’t become sweeter in your mouth.

The only option is a comparative analysis of various methods in historical samples and an assessment of their sensitivity to already known facts of changing search engine ranking functions. Unfortunately, I do not have such statistics. But I would be glad to work with those who have it.

**[UPD 1]**Case study for assessing the competitiveness of search queries