MongoDB: query performance on ranges

Original author: Eric
  • Transfer
If you traveled through the territory of MongoDB indexes, you may have heard the principle: If your queries contain sorting, then add the sorted field to the end of the index used in these queries.

In many cases, when queries contain equality conditions such as {"name": "Charlie"}, the principle above is very useful. But what about him can be said with the following example:

db.drivers.find({"country": {"$in": ["A", "G"]}).sort({"carsOwned": 1})
{"country": 1, "carsOwned": 1}

This bundle is not effective, although the principle is respected. Because there is a trap into which this principle can lead you.
Below we will consider the reasons for the occurrence of this trap and by the end of the article you will have a new rule that will help you with indexing.

Let's recall the basics from the MongoDB documentation:
* “Indexes early”
Indexes deserve consideration at the beginning of design. Historical, efficiency at the data access level was shifted to database administrators, this created an optimization layer after design.
With document-oriented databases, you can avoid this.
* "Indexes often"
Indexed queries work better by several orders of magnitude, even on small data. While without an index, a query can take 10 seconds, the same query can take 0 milliseconds with the corresponding index.
* "Full indexes"
Queries use indexes from left to right. An index can only be used provided that the query uses all the fields in the index without gaps.
* “Sorting the index”
If your query will contain sorting, then add the sorted field to your index.
* "Teams"
.explain () will show which index is used for this request.
.ensureIndex () creates indexes.
.getIndexes () and .getIndexKeys () will show which indexes you have.

Now back to our question. Given the basics of indexing, for the following query:
db.collection.find({"country": "A"}).sort({"carsOwned": 1})

We must create an index like this:
db.collection.ensureIndex({"country": 1, "carsOwned": 1})

What if most queries in the condition use range selection instead of comparison? As in this:
db.collection.find({"country": {"$in": ["A", "G"]}}).sort({"carsOwned": 1})

Here we used the $ in operator, but besides it there are also such as: $ gt, $ lt, etc.
If you use a similar request, you will see that it is not effective, while you remember the basics - you need to run .explain () and see which index is used and how.
As a result of executing .explain (), you will see {scanAndOrder: true}, which means MongoDB performs sorting operations, which is an expensive operation since MongoDB sorts documents in memory. Therefore, you should avoid large data sets. it is slow and resource intensive.

Do not forget why scanAndOrder is slow, why MongoDB sorts the result, although we already have an index with sorting? The answer is simple: we do not have a suitable index.

Why? The reason is simple, the point is the structure of the index that we created. For the example above, documents having {“country”: “A”} and documents having {“country”: “G”} are sorted in the index by {“carsOwned”: 1},
but they are sorted independently of each other. They are not sorted together! Consider the diagram below:

The left diagram shows the crawl order of documents by the index that we created. After all the documents are found, they will need to be sorted.
In the right diagram, the alternative index is {“carsOwned”: 1, “country”: 1}. In this case, the documents found will be already sorted.
This subtle moment of efficiency led to the following rules for indexing:

The order of the fields should be:
1. First, the fields that are selected by exact values.
2. Next, the fields by which sorting will go.
3. And at the end of the field for the range filter.

Is there a compromise? Yes. The request will visit several index nodes, which is technical, because sorted part will be traversed before filtering.
Thus, the new rule is pure net for many queries, but do not forget that the complexity of your data can lead to different results.

I hope this guide helps you. Good luck.

Also popular now: