Sampling random documents from the MongoDB collection

  • Tutorial
Recently, I faced one rather trivial task, where I needed to randomly select posts written by users of the site from the database. The project is written in Rails using MongoDB as a database and mongoid jam to work with it. Not that the task would be difficult to accomplish, but at the same time, surprisingly, there is no absolutely simple solution like sort_by_random or something like that. Under a cat a couple of examples of how this can be solved.


First, let's look at a simple way to solve the problem. There is a method in the mongoid that allows you to skip several records or, in other words, set the cursor for the reference point. This method is called skip and you can pass it the number of records that are worth skipping. If we have a collection with three entries, then to get the second one, you can do something like this Post.skip (1) .first . Knowing the number of documents in the collection, we can shift to a random number of documents and start reading from there:
proxy = Post.where(...)
skip = rand(proxy.count - COUNT_OF_POSTS_TO_SHOW)
@posts = proxy.skip(skip).limit(COUNT_OF_POSTS_TO_SHOW)


If you do not have special conditions under which you make a selection, the code will look simpler. Usually, some conditions will still be present, such as the creation date or status. This sample is pretty random, but not quite, since we randomly select a reference point, and then all the documents go in a row. Perhaps this option of randomness will suit someone, especially if you need to select only one record. But this method may be completely unacceptable in cases where we select products, showing in this way products from the same category or with the same price (depending on the collection indices)
My decision to get completely random records was a bit more complicated, but it gave more correct results. To do this, I needed to add a new field to the collection from which the selection was made, I named it rand_order. We wrote a random floating-point number in it from 0 to 1. The most accurate way to fill in this field is to add a before_save filter for the model, which might look like this:
 def set_rand_order
    self.rand_order = (rand 0.0..1).round(15) unless rand_order
  end


Thus, each time the object is saved, we check whether the value for the rand_order field is filled and fill it if it is empty. Receiving random entries will now happen this way:
proxy = Post.where(...)
skip = rand(proxy.count - COUNT_OF_POSTS_TO_SHOW)
@posts = proxy.asc(:rand_order).skip(skip).limit(COUNT_OF_POSTS_TO_SHOW)


It is worth taking into account that if you use this method for an existing collection that contains documents, then you need to generate random numbers for the rand_order field for them. This can be done in migration and taking into account the fact that we did it in before_filter, you just need to call the save method for each of the objects:
Post.all.each{|p| p.save}

Also popular now: