Assessment of the variability of search results
Once, on a quiet summer night, while solving urgent analytical problems, the question arose of how to measure the degree of variability of search results. In search of an answer, one single study on this subject was discovered - Koksharov, 2012 .
But I did not receive satisfaction, there were even more questions. Using the algorithms of Oliver and Levenshtein just because the corresponding functions are in PHP seemed unreasonable. And the justification of the methods based on the difference in positions is unconvincing.
Why so, and not that way? Why an array or string, rather than an ordered set or tuple? What can the assumptions lead to? And finally, is there one single best, most correct, most “final” way?
As a result, I had to invent my own bike - that is, put everything on the shelves, at least for myself. But still, with the hope that it will be interesting not only to me.
The search for a ready-made mathematical apparatus also yielded nothing. Ordered set? Line? Array? ... It's not that. The closest is a tuple / vector, but the distance measures used there do not reflect the essence of the rating. Either I don’t know something, or too many years have passed since student days. I hope that those who practice math more often will correct me or at least come across an idea in which direction to look. In the meantime, we will try to introduce our own definitions, remaining in terms of the subject area.
To mark everyone’s favorite Top3, Top10, Top100, etc. we introduce the concept of “rating N” as an ordered sequence of
length
containing identifiers of ranked objects
, (1)
where by object identifier
we mean a link (URL) to a ranked document.
The simplest and most natural assumption is that the measure of variability should be somehow related to the change in the position of objects in the ratings. The greater the difference (distance) between the new and old position of a particular object and the more objects that have changed their position, the greater should be the difference between the two ratings.
In this setting, the distance between the two ratings will be called the sum of the differences in the positions of all objects included in the ratings. Let us try to express this definition more formally.
Let two ratings be given
and
. Elements of these ratings may coincide fully or partially, or may not coincide completely.
Then let
- a lot of objects included in both compared ratings. The power of
this set (the number of elements included in it) will vary from
(in the case when the objects in both ratings completely coincide and the difference between the ratings consists only in their permutation) to
(in the case when the elements of the two ratings are completely different).
One and the same object can be in ratings both at different positions, and at coinciding ones. Or maybe not at all in one of the ratings.
We will call the
position of the
-th object in the rating
, and
- the position of the same object in the rating
. Then the distance between the positions of the
ith object will be the modulus of their difference
(2)
Summing up the differences in positions for each element of the set,
we obtain the following expression for the distance between two ratings in absolute terms:
(3)
There is no problem calculating this distance when objects are present in both ratings. But what to do when one of the ratings does not contain objects of another rating, that is, are located outside it? In this case, it seems very reasonable to take the value for the position of the absent object as the
closest position outside the rating.
It is clear that in real life a site can fly out, for example, from Top10 much further than 11th place. And it is possible to increase the accuracy of evaluating the variability of search results by considering ratings of longer lengths - 30, 50, 100, 1000. It is very likely that for large
this assumption will play an increasingly smaller role. In the meantime, the question of choosing the optimal rating length remains open and we have to be satisfied with the assertion that the variability estimates obtained with this assumption will be the estimates of the minimum difference in the sense that the distance between the ratings will be no less than the obtained rating.
Estimates of the absolute difference between ratings are difficult to interpret and compare. For the convenience of operating estimates, they should be reduced to relative form. As a standard value, we need to find the maximum possible distance between the ratings. It is clear that it will correspond to the case when the ratings completely differ in the composition of the elements. That is, all objects of the rating
turned out to be beyond its limits, and all the objects of the rating
appeared from outside its limits. That is, each rating object has
moved from its position to position
, and each rating object
, on the contrary, has moved from position
to position.
Then for the rating the
maximum possible sum of distances will be:




That is, we got the sum of the arithmetic progression with the first member
, step -1, and the last member 1.
Accordingly, for the second rating, when each of its objects moved from position
to position, we get a similar arithmetic progression with the first element 1, step 1 and the last element
, the amount of which
will be determined by the same expression.
As a result, we obtain that the total distance over which the objects of the first and second rating moved will be determined by the expression
(4)
Therefore, for a relative assessment of the rating variability, we obtain the following expression
(5)
Those who wish can deal with this in more detail with a small example.
An attentive reader may notice that the estimates of the degree of change in the rating obtained from expressions (3) or (5) are weakly sensitive to local changes in general and to transpositions in particular. (Transposition is when two elements simply swap places). If the first two or the last two elements are interchanged, we get the same difference. For example, the transposition of 1st and 2nd places or 4th and 5th gives the same difference
.
Perhaps from the point of view of the search engine and its ranking function, such changes are really insignificant. But I, as a practicing marketer, are primarily interested in the consequences for ward sites. But these consequences, even in the case of local changes, can be very significant. And this is primarily due to the fact that the clickability of search results strongly depends on the place in the ranking (in serp) and, therefore, the organic traffic received by sites located in the area of local changes is changing quite significantly (at times).
Thus, it would be desirable to take into account the fact that the difference between the 1st and 2nd place in the search results is much larger than the difference between the 4th and 5th. To do this, we need to introduce a weight function for places in the ranking. And the best such function, reflecting the change in search traffic, will be the dependence of the clickability of search results on the position taken.
In general, the choice of a “good” approximating function for serp click statistics is a topic for a separate study. Ideally, it depends on a very large number of parameters: search engine, type of keyword, snippet quality, site composition, finally. But for our purposes, when we are interested not so much in absolute as in relative (difference in places) estimates, practically any of the known ones can be used. I’m more used to using the following dependence given in Samuilov, 2014 , which demonstrates fairly good approximating capabilities
, (6)
where
is the position in the rating,
- a parameter depending on the search engine and the following values:
. The average value
across all search engines
.
Taking into account (6), the distance between the positions of the
ith object will take the form
(7)
And the absolute distance between the ratings will accordingly be
(8) The
maximum possible weighted distance between the ratings will be determined by the expression
(9)
Then the weighted relative distance will be determined by the expression
(10)
It should be noted that in the end, the weighted relative distance
does not depend on the parameter , that is, on the search engine.
For the above example, the weighted distance is 61%. That is, it is more sensitive to replacing the leader of the rating.
Well, it is much more sensitive to local changes: transposition 1-2 in the Top5 rating will have a value of 34%, and transposition 4-5 - a value of 3.4%.
The measures obtained can be used for various problems of analysis of fluctuations in search results. These tasks determine specific profiles for analysis: the composition of search queries (by type, subject, length, frequency), search area (by region, web / news / illustrations / blogs), etc.
Analysis of search engine updates . This has already become the classic task of analyzing search engine variation. The more representative the set of keywords, the better will be the assessment of global changes in the algorithm / ranking base.
Reputation Management Tasks . As a set of keywords, brand queries related to your company / product are used here. By analyzing fluctuations in the news feed, you can determine increased activity in the profile you are interested in.
Niche Competition Analysis. The increased variety of search results for thematic queries can be interpreted as an indicator of low competition when unequivocal leaders have not yet been determined.
How to determine which method of analyzing search engine variation is "the most final"? You can call your methods “correct” updates, “accurate”, “most accurate” ... But no matter how much you say “halva” - it won’t become sweeter in your mouth.
The only option is a comparative analysis of various methods in historical samples and an assessment of their sensitivity to already known facts of changing search engine ranking functions. Unfortunately, I do not have such statistics. But I would be glad to work with those who have it.
[UPD 1] Case study for assessing the competitiveness of search queries
But I did not receive satisfaction, there were even more questions. Using the algorithms of Oliver and Levenshtein just because the corresponding functions are in PHP seemed unreasonable. And the justification of the methods based on the difference in positions is unconvincing.
Why so, and not that way? Why an array or string, rather than an ordered set or tuple? What can the assumptions lead to? And finally, is there one single best, most correct, most “final” way?
As a result, I had to invent my own bike - that is, put everything on the shelves, at least for myself. But still, with the hope that it will be interesting not only to me.
Measure of rating variability
The search for a ready-made mathematical apparatus also yielded nothing. Ordered set? Line? Array? ... It's not that. The closest is a tuple / vector, but the distance measures used there do not reflect the essence of the rating. Either I don’t know something, or too many years have passed since student days. I hope that those who practice math more often will correct me or at least come across an idea in which direction to look. In the meantime, we will try to introduce our own definitions, remaining in terms of the subject area.
To mark everyone’s favorite Top3, Top10, Top100, etc. we introduce the concept of “rating N” as an ordered sequence of



where by object identifier

The simplest and most natural assumption is that the measure of variability should be somehow related to the change in the position of objects in the ratings. The greater the difference (distance) between the new and old position of a particular object and the more objects that have changed their position, the greater should be the difference between the two ratings.
In this setting, the distance between the two ratings will be called the sum of the differences in the positions of all objects included in the ratings. Let us try to express this definition more formally.
Let two ratings be given


Then let




One and the same object can be in ratings both at different positions, and at coinciding ones. Or maybe not at all in one of the ratings.
We will call the







Summing up the differences in positions for each element of the set,


There is no problem calculating this distance when objects are present in both ratings. But what to do when one of the ratings does not contain objects of another rating, that is, are located outside it? In this case, it seems very reasonable to take the value for the position of the absent object as the

It is clear that in real life a site can fly out, for example, from Top10 much further than 11th place. And it is possible to increase the accuracy of evaluating the variability of search results by considering ratings of longer lengths - 30, 50, 100, 1000. It is very likely that for large

Estimates of the absolute difference between ratings are difficult to interpret and compare. For the convenience of operating estimates, they should be reduced to relative form. As a standard value, we need to find the maximum possible distance between the ratings. It is clear that it will correspond to the case when the ratings completely differ in the composition of the elements. That is, all objects of the rating






Then for the rating the





That is, we got the sum of the arithmetic progression with the first member

Accordingly, for the second rating, when each of its objects moved from position



As a result, we obtain that the total distance over which the objects of the first and second rating moved will be determined by the expression

Therefore, for a relative assessment of the rating variability, we obtain the following expression

Those who wish can deal with this in more detail with a small example.
Example for Top5
Let
,
, and
.
Then








From here the absolute distance between the ratings will be The

maximum possible distance will be
.
So, we get the following relative distance
or 40%



Then








From here the absolute distance between the ratings will be The

maximum possible distance will be

So, we get the following relative distance

Weighted measure of rating variability
An attentive reader may notice that the estimates of the degree of change in the rating obtained from expressions (3) or (5) are weakly sensitive to local changes in general and to transpositions in particular. (Transposition is when two elements simply swap places). If the first two or the last two elements are interchanged, we get the same difference. For example, the transposition of 1st and 2nd places or 4th and 5th gives the same difference

Perhaps from the point of view of the search engine and its ranking function, such changes are really insignificant. But I, as a practicing marketer, are primarily interested in the consequences for ward sites. But these consequences, even in the case of local changes, can be very significant. And this is primarily due to the fact that the clickability of search results strongly depends on the place in the ranking (in serp) and, therefore, the organic traffic received by sites located in the area of local changes is changing quite significantly (at times).
Thus, it would be desirable to take into account the fact that the difference between the 1st and 2nd place in the search results is much larger than the difference between the 4th and 5th. To do this, we need to introduce a weight function for places in the ranking. And the best such function, reflecting the change in search traffic, will be the dependence of the clickability of search results on the position taken.
In general, the choice of a “good” approximating function for serp click statistics is a topic for a separate study. Ideally, it depends on a very large number of parameters: search engine, type of keyword, snippet quality, site composition, finally. But for our purposes, when we are interested not so much in absolute as in relative (difference in places) estimates, practically any of the known ones can be used. I’m more used to using the following dependence given in Samuilov, 2014 , which demonstrates fairly good approximating capabilities

where





Taking into account (6), the distance between the positions of the


And the absolute distance between the ratings will accordingly be

maximum possible weighted distance between the ratings will be determined by the expression

Then the weighted relative distance will be determined by the expression

It should be noted that in the end, the weighted relative distance

For the above example, the weighted distance is 61%. That is, it is more sensitive to replacing the leader of the rating.
Well, it is much more sensitive to local changes: transposition 1-2 in the Top5 rating will have a value of 34%, and transposition 4-5 - a value of 3.4%.
Variability of profile ratings
The measures obtained can be used for various problems of analysis of fluctuations in search results. These tasks determine specific profiles for analysis: the composition of search queries (by type, subject, length, frequency), search area (by region, web / news / illustrations / blogs), etc.
Analysis of search engine updates . This has already become the classic task of analyzing search engine variation. The more representative the set of keywords, the better will be the assessment of global changes in the algorithm / ranking base.
Reputation Management Tasks . As a set of keywords, brand queries related to your company / product are used here. By analyzing fluctuations in the news feed, you can determine increased activity in the profile you are interested in.
Niche Competition Analysis. The increased variety of search results for thematic queries can be interpreted as an indicator of low competition when unequivocal leaders have not yet been determined.
In conclusion
How to determine which method of analyzing search engine variation is "the most final"? You can call your methods “correct” updates, “accurate”, “most accurate” ... But no matter how much you say “halva” - it won’t become sweeter in your mouth.
The only option is a comparative analysis of various methods in historical samples and an assessment of their sensitivity to already known facts of changing search engine ranking functions. Unfortunately, I do not have such statistics. But I would be glad to work with those who have it.
[UPD 1] Case study for assessing the competitiveness of search queries