varagian January 20, 2014 at 18:08

Not all comments are equally helpful.

All animals are equal, but some animals are more equal than others. Barnyard, George Orwell ( original ).

Quite a lot of articles on the Habré gain a significant number of comments, for example, in the articles “ best of the month ” there are usually more than a hundred of them. Over the years of reading the Habr, it seemed that in about half the cases for first-level comments this is the picture

(the picture was made on the basis of the Habr article “Skeptic's List” ).

Under the cut is the story, what sort of comments are sorted, where they are applied and a brief discussion of how comments can be sorted (and why).

Generally speaking, the problem of sorting comments, posts and everything else is not new: Facebook infographics , sorting comments on reddit here and here andA brief description of the algorithm parameters from digg.

Basic methods for sorting first level comments

We will go from simple to complex, briefly describing and characterizing the methods. Let's start with the simplest and most naive methods: averaging and its variations (details are described here ).

Hereinafter, except as otherwise indicated, by comment we mean “top-level comment”. The meaning is approximately as follows, top-level comments are addressed to the article itself, and comments of the second, third, etc. are a discussion around the comment. Methods that take into account the entire branch of comments will be briefly discussed at the end of the article.

The task is to assign a certain number of points (comment weight) to the comments and sort the entire list by this parameter.

The number of pluses minus the number of minuses

This is used in the urbandictionary online dictionary .

This method is the simplest, but far from the most consistent with user expectations. For example, in the picture above, we see that a description with 72% percent of positive ratings is placed lower than a description with 70% of positive ratings.

Relative average rating: the number of pluses to the total number of ratings

One of the sorting options on Amazon is the relative average score. Suppose we are looking for a toaster and have included sorting by user reviews:

It turns out that an element with a single vote of 5 * will always be higher than any element that has at least one other rating (4 *, 3 *, 2 *, 1 *), regardless of the number of these ratings. Simply put, if we have a product that has 9,999 ratings of 5 * and one in 4 *, then this product will be lower than a product with a single rating of 5 *.

Probability of waiting for a positive assessment

Before the elections, public opinion polls always take place, where, according to a small number of respondents, they try to restore the general picture of the world. From the small number of measurements available to us, we try to understand the following, having the observed number of ratings (pros and cons to the product), with a probability of at least 0.85, what is the “real share” of positive ratings?

How such things are usually calculated in an accessible language is written in the book:
Probabilistic Programming & Bayesian Methods for Hackers

In short, such calculations are too time-consuming, so they use the formula for the lower bound of the Wilson confidence interval (i.e., you have to use the estimated value):

p̂ is the observed relative number of pluses, n is the total number of estimates, z _{α / 2} is taken from the tables, this is the quantile (1- α / 2) of the standard normal distribution (for 15% z = 1 - 85% confidence, for 5% z = 1.6 - 95% confidence)

If it seems to you that something is written on the top like that:

Then in understandable language and in detail it is described here , at the end here and it is best described here . There you can find short and understandable implementations of the described method (and many other clarifying pictures).

In various variations, this method uses reddit, digg and yelp as part of the ranking algorithm.

Explanation and example from Randal Munroe (author of xkcd, he also introduced this algorithm in reddit):

If the commentary has only one plus and zero minuses, then the relative number of pluses is 1.0, but since there is little data [n - approx. author], the system will place it at the end of the list. And if he has 10 pluses and one minus, then the system will have enough confidence [in the sense - evidence; corroborating evidence] that to place this comment is higher than the comment with 40 pluses and 20 minuses - considering that by the time the comment receives 40 pluses, it will almost certainly have less than 20 minuses. And the main charm of the system is that if it is wrong (in 15% of cases), it will quickly get enough data to make the comment with less data appear at the very top.

In general, intuition, for similar Bayesian inference methods, is as follows: we have some hunch how to vote on average for posts, for example, you can get a plus with a probability of 0.7, and a minus with a probability of 0.3 is a sort of spoiled coin. Having received a new piece of information on a specific coin, we correct our idea of it according to the Bayes rule:

where H is the hypothesis (Hypothesis), E is the observed data (Evidence), P (E | H) = probability to get the result E, if hypothesis H is true , together they give us the opportunity to say what is the probability of hypothesis H for the observed result E.

New options in sorting

The algorithms discussed above use only the number of pluses and minuses. In this part, we will figure out what other parameters can affect the ranking of comments.

The main sources will be the following articles:
Reddit, Stumbleupon, Del.icio.us and Hacker News Algorithms Exposed!
How Hacker News ranking algorithm works
How Reddit ranking algorithms work

Time

Why would anyone even need to use time other than chronological sorting? Imagine that we go to the articles by the link from "What is being discussed?" it is quite logical (although of course, like other value judgments, it is debatable) that the user expects a discussion and fresh comments, it is thanks to them that the articles fall into this section.

What about other large sites with custom content? We will look at reddit and hacker news

Reddit algorithm for posts

Reddit uses two different ranking algorithms: one for posts, the second for comments. For comments, we use the option described above with the lower bound of the Wilson confidence interval, but for posts the time is taken into account:

briefly describe the algorithm.

Let A be the time in seconds when the post was published, B is the reddit resource creation time in seconds, then t
t = A - B
i.e. relative post lifetime in seconds.
x = difference between the number of pluses and minuses of the post.
z = | x | if x! = 0, otherwise z = 1.

Hacker news algorithm

This resource uses the same algorithm to sort comments and posts.
Let t be the number of hours since publication, v the number of votes, G a constant (they say it is 1.8 by default):

And why goat button accordion?

It would seem that in the case of commentary, ratings should be enough and time should not play a significant role. On the other hand, when the number of comments grows, many users simply don’t get to the very end (try scrolling through 300-400 comments). A possible alternative is to sort not by one, but by several parameters i.e. take into account time, other things being equal. For example, if all comments have 0 votes, then “fresh comments” are preferable to old ones, as the old ones had time to gain a certain amount of pluses, and if they did not get them, then it might be worth giving preference to new comments.

Alternatively, we can use time as a small weight decay element in a ranking function, such as in reddit posts or hacker news comments.

Branch-wide ranking

A natural assumption will be to take into account comments in one branch with some weight depending on their level / position in the branch (in the trivial case, we can consider the whole branch as a set of peer comments, i.e. with the same weight). Suppose that v _i⁺ is the number of pluses for comment i (the same notation for minuses), and l _i is the nesting level of comment i, and I is the set of all comment indices in the branch, then we can offer a simple generalization of the first method taking into account the whole branch:

(here we mean that v _i⁺ , l _i are actually functions that return a skolar by the comment index, therefore they are not in the input data).

In a similar (naive) manner, other methods for ranking comments can be generalized. Why is this needed? If we believe that not only the first-level comment is important in ranking, but, for example, the whole discussion matters, then it is really important to rank the entire branch. If we pay attention to the picture at the beginning of the article, then in spite of the “heavily ragged” first comment (and, as I think, for the case), a rather interesting discussion arose around it in the first thread.

Weight and user profile (s)

The exact digg algorithm is not known to the general public, but at least we know what approximately parameters are included there ( source for the algorithm):

The number of votes given the time window: many votes for a short period, better than many votes for a long period
Link source (specific for digg, news points to a specific site with a source): how often do articles from this source get to the top? (get the pros)
Author Profile
Departure time: if a bunch of people simultaneously add an article at three in the morning, something might be wrong here [a controversial parameter, isn't it? - approx. the author]
The presence of similar links on digg itself (duplicates)
Voter Profiles
Number of Comments
Number of views
....

In general, they have a complicated algorithm. But what can we, in principle, draw from it? It is possible to take into account comments and weight of votes for posts in different ways depending on various parameters, and a sufficient number of these parameters are already known from the algorithms of other resources.

In many ways, the idea is similar to time, as with an additional parameter: ceteris paribus, when ranking comments, you can take into account the user's weight (or even his contribution to a certain hub), so if there are a lot of comments on the article, then an “authoritative opinion” will rise above the rest and will be available for reading.

Conclusion

Comment ranking is a popular and often useful element of many resources. Whether this is a necessary question at the hub Is there really a problem with an abundance of comments on certain articles? If not, will it appear in the future? Do I need any other sorting tools?

But you should always remember that the main thing for any resource is not the sorting of comments, but the quality of the content.

Only registered users can participate in the survey. Please come in.

Do I need to sort comments on Habrahabr

36% Needed (in one of the forms and / or variations from the article) 448
5.9% needed, but only by votes 74
2% needed; another version 25
32.6% Possible; need a small / temporary experiment 405
21.9% No, not needed in any form 272
1.3% Other (answer in the comments) 17

Tags: