MongoDB: $ or VS $ in - what works faster?

    There will be a very small comparison of MongoDB performance in cases of using $ or and $ in logical operations in queries. I hope that this article saves someone working time.

    Tests run on MongoDB 2.4.9.
    Suppose MongoDB has a collection of documents. To simplify the understanding of the essence - let it be documents with exactly two fields.
    $m = new MongoClient('mongodb://mongodb01,mongodb02,mongodba/?replicaSet=pkrs');
    $mdb = $m->selectDB('test');
    $collection = $mdb->selectCollection('test');
    $collection->drop();
    $collection->ensureIndex(array('i' => 1, 'j' => 1));
    for ($i = 0; $i < 100; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $collection->insert(array('i' => $i, 'j' => $j));
        }
    }
    

    The collection will contain only 10K documents. Yes, here it was possible to use batchInsert, but I do not want to complicate the understanding of the main essence of the note.

    It is necessary to regularly (several times per second) select up to 1000 documents. The sampling condition is a set of unbound pairs i and j .
    Because Since I started working with MongoDB less than a month ago, the first thing that came to mind was a request like this:
    $orArray = array();
    for ($i = 0; $i < 10; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $orArray[] = array('i' => $i, 'j' => $j);
        }
    }
    $query = array('$or' => $orArray);
    

    The fact that here the data goes in order is just an example, so as not to boggle the head with business logic. In reality, as I noted above, pairs i and j are not connected to each other in any way and go in a chaotic manner.
    Having tried to fulfill this request, my eyes widened from unpleasant surprise - the request was completed for more than 2 seconds ! Above in the code, you can see that the index is created.
    This was generally unacceptable.
    I decided to make sure that it is not the network that slows down, but the matter is in the request.
    For the test, he will make this request:
    $query = array('i' => array('$lt' => 10), 'j' => array('$lt' => 100));
    

    The result in the amount of data is the same, but the query is already starting to run in 0.01 seconds .
    It became clear that you need to look for a workaround. And he was found. By the logic of the request, the use of $ in instead of $ or was suggested. But I could not find how to use $ in immediately on pairs of values. If there is such a way, then I will be very grateful for the hint.
    Since I don’t know how to make $ in in two fields, we will introduce an artificial field as follows (we blind the values ​​of i and j through the underscore '_'):
    $collection->ensureIndex(array('ij' => 1));
    for ($i = 0; $i < 100; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $collection->insert(array('i' => $i, 'j' => $j, 'ij' => $i.'_'.$j));
        }
    }
    

    And then our request becomes the following:
    $inArray = array();
    for ($i = 0; $i < 10; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $inArray[] = $i.'_'.$j;
        }
    }
    $query = array('ij' => array('$in' => $inArray));
    

    And “oh miracle!” we get our data in just 0.01 seconds (and it all started with "more than 2 seconds").
    Googling a little, I found the following explanation of this phenomenon: when querying with the $ or MongoDB construct, it supposedly makes several queries and then “freezes” the results. I’m not sure that this statement is right, but I haven’t found another yet.

    PS Conclusion: do not abuse $ or
    PPS. The code below shows how I measured the time. If someone is not in the know, I’ll clarify that the call is not executed when find () is called! Only the MongoCursor object is created. And only when requesting the first document is the request itself. Therefore, time cutoffs are taken at the first iteration of the cycle of obtaining documents.
    PPPS If someone will be interested in driving tests at home, then here is the whole source:
    selectDB('test');
    $collection = $mdb->selectCollection('test');
    $collection->drop();
    $collection->ensureIndex(array('i' => 1, 'j' => 1));
    for ($i = 0; $i < 100; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $collection->insert(array('i' => $i, 'j' => $j));
        }
    }
    $orArray = array();
    for ($i = 0; $i < 10; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $orArray[] = array('i' => $i, 'j' => $j);
        }
    }
    $query = array('$or' => $orArray);
    testQuery('OR Query', $collection, $query);
    $query = array('i' => array('$lt' => 10), 'j' => array('$lt' => 100));
    testQuery('Range Query', $collection, $query);
    $collection->drop();
    $collection->ensureIndex(array('ij' => 1));
    for ($i = 0; $i < 100; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $collection->insert(array('i' => $i, 'j' => $j, 'ij' => $i.'_'.$j));
        }
    }
    $inArray = array();
    for ($i = 0; $i < 10; ++$i) {
        for ($j = 0; $j < 100; ++$j) {
            $inArray[] = $i.'_'.$j;
        }
    }
    $query = array('ij' => array('$in' => $inArray));
    testQuery('IN Query', $collection, $query);
    function testQuery ($testName, $collection, $query) {
        $cursor = $collection->find($query);
        $cursor->batchSize(1000);
        $start = microtime(true);
        $first = true;
        foreach ($cursor as $doc) {
            if ($first) {
                $time1 = microtime(true);
                $first = false;
            }
        }
        $time2 = microtime(true);
        $resultFirst = $time1 - $start;
        $resultOther = $time2 - $time1;
        echo "{$testName} - First: {$resultFirst} Other: {$resultOther}
    \n"; }


    Instead of the composite index ( UPD 1 dim_s ), it was evident from the test above that it was the composite index that was used) to make two separate ones. Having done so, the request processing speed has accelerated by about 10 times (up to 0.2 seconds), but still loses to the option with $ in. I

    spread what explain gives:
    Explain ($ or at composite index)
    /* 0 */
    {
        "clauses" : [ 
            {
                "cursor" : "BtreeCursor i_1_j_1",
                "isMultiKey" : false,
                "n" : 1,
                "nscannedObjects" : 1,
                "nscanned" : 1,
                "nscannedObjectsAllPlans" : 1,
                "nscannedAllPlans" : 1,
                "scanAndOrder" : false,
                "indexOnly" : false,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 0,
                "indexBounds" : {
                    "i" : [ 
                        [ 
                            1, 
                            1
                        ]
                    ],
                    "j" : [ 
                        [ 
                            1, 
                            1
                        ]
                    ]
                }
            }, 
            {
                "cursor" : "BtreeCursor i_1_j_1",
                "isMultiKey" : false,
                "n" : 1,
                "nscannedObjects" : 1,
                "nscanned" : 1,
                "nscannedObjectsAllPlans" : 1,
                "nscannedAllPlans" : 1,
                "scanAndOrder" : false,
                "indexOnly" : false,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 0,
                "indexBounds" : {
                    "i" : [ 
                        [ 
                            2, 
                            2
                        ]
                    ],
                    "j" : [ 
                        [ 
                            2, 
                            2
                        ]
                    ]
                }
            }
        ],
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "millis" : 0,
        "server" : "mongodb01:27017"
    }
    


    Explain ($ or with two separate indices for i and j)
    /* 0 */
    {
        "clauses" : [ 
            {
                "cursor" : "BtreeCursor i_1",
                "isMultiKey" : false,
                "n" : 1,
                "nscannedObjects" : 100,
                "nscanned" : 100,
                "nscannedObjectsAllPlans" : 300,
                "nscannedAllPlans" : 300,
                "scanAndOrder" : false,
                "indexOnly" : false,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 1,
                "indexBounds" : {
                    "i" : [ 
                        [ 
                            1, 
                            1
                        ]
                    ]
                }
            }, 
            {
                "cursor" : "BtreeCursor i_1",
                "isMultiKey" : false,
                "n" : 1,
                "nscannedObjects" : 100,
                "nscanned" : 100,
                "nscannedObjectsAllPlans" : 300,
                "nscannedAllPlans" : 300,
                "scanAndOrder" : false,
                "indexOnly" : false,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 1,
                "indexBounds" : {
                    "i" : [ 
                        [ 
                            2, 
                            2
                        ]
                    ]
                }
            }
        ],
        "n" : 2,
        "nscannedObjects" : 200,
        "nscanned" : 200,
        "nscannedObjectsAllPlans" : 600,
        "nscannedAllPlans" : 600,
        "millis" : 2,
        "server" : "mongodb01:27017"
    }
    


    Explain ($ in with artificially entered index)
    /* 0 */
    {
        "cursor" : "BtreeCursor ij_1 multi",
        "isMultiKey" : false,
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 3,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 3,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
            "ij" : [ 
                [ 
                    "1_1", 
                    "1_1"
                ], 
                [ 
                    "2_2", 
                    "2_2"
                ]
            ]
        },
        "server" : "mongodb01:27017"
    }
    


    UPD 2 I ran the test on the latest version of MongoDB 2.6
    Indeed, the original version (composite index for two fields) works much faster there! Namely, in 0.07 seconds. But at the same time, the option with an index of the form i_j is still in 0.006 - 0.01 seconds (i.e., about 10 times faster)

    Also popular now: