New aggregation framework in MongoDB 2.1

    In release 2.1 , the implementation of such functionality as the new data aggregation framework was announced . I would like to talk about the first impressions of this very interesting thing. This functionality should allow in some places to abandon Map / Reduce and write JavaScript code in favor of fairly simple constructions designed to group fields almost like in SQL.



    Documentation on innovations is located in the corresponding section of the official website. First, let's look at how this works and which MongoDB constructs will help us.

    So, the main difficulty in fetching data from MongoDB is working with arrays and data contained inside some separate elements. Yes, we can select them as in SQL, but we cannot aggregate them directly when fetching. The new framework is a declarative way of working with such data, based on a chain of special operators (there are only 7 of them). Sample data is transferred from the output of one operator to the input of another, just like in unix. Partly with the help of new operators, you can repeat the existing ones. Let the test collection be a collection for storing data about people. Standard sample:

    db.test.find({name: "Ivan"});

    will be similar

    db.test.aggregate({$match: {name: "Ivan"}});

    But all is a little more interesting, because in the second example we can build a data processing chain by listing the operators with a comma. The $ sort operator is used for sorting, for example:

    db.test.aggregate({$match: {name: "Ivan"}}, {$sort: {age: 1}});

    So we web all the people with the name “Ivan” and sort the sample by age. And in order to choose the oldest Ivan, we need to cut the selection with one element:

    db.test.aggregate({$match: {name: "Ivan"}}, {$sort: {age: -1}}, {$limit: 1});

    You will say that this is a repetition of an existing functionalism. To some extent, yes, but we have not considered new operators that allow more flexible working with samples. We will analyze them in more detail.

    Operator $ project


    Designed to manipulate fields, can add new ones, delete and rename them in documents received at his input. The following construction will include in the document flow (filter out) only the names and age of users:

    {$project: {name: 1, age: 1}}

    All documents with only two fields will get to the input of the next operator, there will be no other fields in the stream (except for the _id field, to exclude it, you must specifically specify _id: 0). The number 1 includes, the number 0 excludes the transmission of the field. In addition, this operator allows you to rename fields, “get” fields from the embedded object of a field, or add new fields based on some calculations .

    The $ unwind operator


    In my opinion, this is the most interesting operator. It allows you to "expand" nested arrays for each element of the selection of documents. For example, let us have the following base of people:

    db.test.insert({name: "Ivan", likes: ["Maria", "Anna"]});
    db.test.insert({name: "Serge", likes: ["Anna"]});

    Let the likes field mean which girls like which boy. Apply the $ unwind operator:

    db.test.aggregate({$unwind: "$likes"});

    {
            "result" : [
                    {
                            "_id" : ObjectId("4f598de76a8f8bc74573e9fd"),
                            "name" : "Ivan",
                            "likes" : "Maria"
                    },
                    {
                            "_id" : ObjectId("4f598de76a8f8bc74573e9fd"),
                            "name" : "Ivan",
                            "likes" : "Anna"
                    },
                    {
                            "_id" : ObjectId("4f598e086a8f8bc74573e9fe"),
                            "name" : "Serge",
                            "likes" : "Anna"
                    }
            ],
            "ok" : 1
    }

    We see that the likes array has expanded and each document now has a likes field with every array value that it had before. If we want to find the most popular girl, just group the selection by the likes field. To group, use the following operator.

    $ Group operator


    For convenience, we supplement the selection with another field filled with the number 1 (it will be easier to summarize this way):

    db.test.aggregate({$unwind: "$likes"}, {$project: {name:1, likes:1, count: {$add: [1]}}});

    {
            "result" : [
                    {
                            "_id" : ObjectId("4f598de76a8f8bc74573e9fd"),
                            "name" : "Ivan",
                            "likes" : "Maria",
                            "count" : 1
                    },
                    {
                            "_id" : ObjectId("4f598de76a8f8bc74573e9fd"),
                            "name" : "Ivan",
                            "likes" : "Anna",
                            "count" : 1
                    },
                    {
                            "_id" : ObjectId("4f598e086a8f8bc74573e9fe"),
                            "name" : "Serge",
                            "likes" : "Anna",
                            "count" : 1
                    }
            ],
            "ok" : 1
    }

    This will allow us to use the $ sum aggregation operator. That is, now we simply add the value of the count field to the number field each time and group the entire sample by the likes field containing the name of the girl.

    db.test.aggregate({$unwind: "$likes"}, {$project: {name:1, likes:1, count: {$add: [1]}}}, {$group: {_id: "$likes", number: {$sum: "$count"}}});

    {
            "result" : [
                    {
                            "_id" : "Anna",
                            "number" : 2
                    },
                    {
                            "_id" : "Maria",
                            "number" : 1
                    }
            ],
            "ok" : 1
    }

    It remains to sort and limit the output to only one document:

    db.test.aggregate({$unwind: "$likes"}, {$project: {name:1, likes:1, count: {$add: [1]}}}, {$group: {_id: "$likes", number: {$sum: "$count"}}}, {$sort: {number: -1}}, {$limit: 1});


    { "result" : [ { "_id" : "Anna", "number" : 2 } ], "ok" : 1 }


    Our most popular girl is Anna.

    And now a concrete example.



    In order to purely specifically penetrate with new opportunities, suppose we have a collection that stores data about animals in the zoo and solve several problems of data aggregation. Here are our paws and tails:

    db.zoo.insert({name: "Lion", ration: [{meat: 20}, {fish: 1}, {water: 30}], holidays: [1,4], staff: {like:  ["Petrovich", "Mihalich"], dislike: "Maria"}});
    db.zoo.insert({name: "Tiger", ration: [{meat: 15}, {water: 25}], holidays: [6], staff: {like:  ["Petrovich", "Maria"]}});
    db.zoo.insert({name: "Monkey", ration: [{banana: 15}, {water: 10}, {nuts: 1}], holidays: [2], staff: {like:  ["Anna"], dislike: "Petrovich"}});
    db.zoo.insert({name: "Panda", ration: [{bamboo: 15}, {dumplings: 50}, {water: 3}], staff: {like:  ["Petrovich", "Mihalich", "Maria", "Anna"]}});

    The name field stores the name, the ration field is an array of objects storing how much and what kind of food the beast needs every day, holidays are days on which the beast rests and is not shown to visitors, staff.like are the caretakers that he likes (pandas, charming, love most of all ), staff.dislike - do not like it.

    Let's start with a simple selection - just the names of the animals, so that the director of the zoo does not forget who is called. Everything is simple here:

    db.zoo.aggregate({$project: {name: 1}});

    {
            "result" : [
                    {
                            "_id" : ObjectId("4f58b7f627f86b11258dc70c"),
                            "name" : "Lion"
                    },
                    {
                            "_id" : ObjectId("4f58b86027f86b11258dc70d"),
                            "name" : "Tiger"
                    },
                    {
                            "_id" : ObjectId("4f58b90c27f86b11258dc70e"),
                            "name" : "Monkey"
                    },
                    {
                            "_id" : ObjectId("4f58b98727f86b11258dc70f"),
                            "name" : "Panda"
                    }
            ],
            "ok" : 1
    }


    What kind of animals do you need a fighter?


    It is necessary to be afraid of predators. A predator is one who has meat in the diet. Let's find them. First, we filter the stream and select only two fields in the documents - the name and ration.

    db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}});

    {
            "result" : [
                    {
                            "name" : "Lion",
                            "ration" : [
                                    {
                                            "meat" : 20
                                    },
                                    {
                                            "fish" : 1
                                    },
                                    {
                                            "water" : 30
                                    }
                            ]
                    },
                    {
                            "name" : "Tiger",
                            "ration" : [
                                    {
                                            "meat" : 15
                                    },
                                    {
                                            "water" : 25
                                    }
                            ]
                    },
                    {
                            "name" : "Monkey",
                            "ration" : [
                                    {
                                            "banana" : 15
                                    },
                                    {
                                            "water" : 10
                                    },
                                    {
                                            "nuts" : 1
                                    }
                            ]
                    },
                    {
                            "name" : "Panda",
                            "ration" : [
                                    {
                                            "bamboo" : 15
                                    },
                                    {
                                            "dumplings" : 50
                                    },
                                    {
                                            "water" : 3
                                    }
                            ]
                    }
            ],
            "ok" : 1
    }


    Then expand the ration array to the elements of the main array:

    db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}, {$unwind: "$ration"});

    {
            "result" : [
                    {
                            "name" : "Lion",
                            "ration" : {
                                    "meat" : 20
                            }
                    },
                    {
                            "name" : "Lion",
                            "ration" : {
                                    "fish" : 1
                            }
                    },
                    {
                            "name" : "Lion",
                            "ration" : {
                                    "water" : 30
                            }
                    },
                    {
                            "name" : "Tiger",
                            "ration" : {
                                    "meat" : 15
                            }
                    },
                    {
                            "name" : "Tiger",
                            "ration" : {
                                    "water" : 25
                            }
                    },
                    {
                            "name" : "Monkey",
                            "ration" : {
                                    "banana" : 15
                            }
                    },
                    {
                            "name" : "Monkey",
                            "ration" : {
                                    "water" : 10
                            }
                    },
                    {
                            "name" : "Monkey",
                            "ration" : {
                                    "nuts" : 1
                            }
                    },
                    {
                            "name" : "Panda",
                            "ration" : {
                                    "bamboo" : 15
                            }
                    },
                    {
                            "name" : "Panda",
                            "ration" : {
                                    "dumplings" : 50
                            }
                    },
                    {
                            "name" : "Panda",
                            "ration" : {
                                    "water" : 3
                            }
                    }
            ],
            "ok" : 1
    }


    Next, we filter the selection only for those fields where there is a ration.meat field

    db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}, {$unwind: "$ration"}, {$match: {"ration.meat": {$exists: true}}});

    {
            "result" : [
                    {
                            "name" : "Lion",
                            "ration" : {
                                    "meat" : 20
                            }
                    },
                    {
                            "name" : "Tiger",
                            "ration" : {
                                    "meat" : 15
                            }
                    }
            ],
            "ok" : 1
    }


    And the final conclusion is only the name of the predator

    db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}, {$unwind: "$ration"}, {$match: {"ration.meat": {$exists: true}}}, {$project: {name: 1, _id: 0}});

    {
            "result" : [
                    {
                            "name" : "Lion"
                    },
                    {
                            "name" : "Tiger"
                    }
            ],
            "ok" : 1
    }


    What days does at least one beast rest?


    To do this, “stratify” the holidays array into the entire array of animals (the panda, as usual, is accessible to everyone and always).

    db.zoo.aggregate({$project: {name: 1, holidays: 1}}, {$unwind: "$holidays"});

    {
            "result" : [
                    {
                            "_id" : ObjectId("4f58b7f627f86b11258dc70c"),
                            "name" : "Lion",
                            "holidays" : 1
                    },
                    {
                            "_id" : ObjectId("4f58b7f627f86b11258dc70c"),
                            "name" : "Lion",
                            "holidays" : 4
                    },
                    {
                            "_id" : ObjectId("4f58b86027f86b11258dc70d"),
                            "name" : "Tiger",
                            "holidays" : 6
                    },
                    {
                            "_id" : ObjectId("4f58b90c27f86b11258dc70e"),
                            "name" : "Monkey",
                            "holidays" : 2
                    },
                    {
                            "_id" : ObjectId("4f58b98727f86b11258dc70f"),
                            "name" : "Panda"
                    }
            ],
            "ok" : 1
    }


    And we filter out only those where the holidays field is a number greater than -1 (well, or 0, to whom it is more convenient)

    db.zoo.aggregate({$project: {name: 1, holidays: 1}}, {$unwind: "$holidays"},{$match: {holidays : {$gt: -1}}});

    {
            "result" : [
                    {
                            "_id" : ObjectId("4f58b7f627f86b11258dc70c"),
                            "name" : "Lion",
                            "holidays" : 1
                    },
                    {
                            "_id" : ObjectId("4f58b7f627f86b11258dc70c"),
                            "name" : "Lion",
                            "holidays" : 4
                    },
                    {
                            "_id" : ObjectId("4f58b86027f86b11258dc70d"),
                            "name" : "Tiger",
                            "holidays" : 6
                    },
                    {
                            "_id" : ObjectId("4f58b90c27f86b11258dc70e"),
                            "name" : "Monkey",
                            "holidays" : 2
                    }
            ],
            "ok" : 1
    }


    We remove all unnecessary.

    db.zoo.aggregate({$project: {name: 1, holidays: 1}}, {$unwind: "$holidays"},{$match: {holidays : {$gt: -1}}}, {$project: {holidays: 1, _id: 0}});

    {
            "result" : [
                    {
                            "holidays" : 1
                    },
                    {
                            "holidays" : 4
                    },
                    {
                            "holidays" : 6
                    },
                    {
                            "holidays" : 2
                    }
            ],
            "ok" : 1
    }


    How many products per day need to be purchased.


    The most interesting, in my opinion, task. To implement it, remember that $ project can create fields and create a meat field with the value of the meat property .

    db.zoo.aggregate({$project: {ration: 1, _id: 0}}, {$unwind: "$ration"}, {$project: {ration: 1, meat: "$ration.meat", _id: 0}});
    


    If this field is not in the properties of the animal’s diet, then it will not be created. Here is an example of a sample part:

    {
            "result" : [
                    {
                            "ration" : {
                                    "meat" : 20
                            },
                            "meat" : 20
                    },
                    {
                            "ration" : {
                                    "fish" : 1
                            }
                    },
                    {
                            "ration" : {
                                    "water" : 30
                            }
                    },
    ...
    }


    We do this for all types of food and remove the output of the ration object itself:

    db.zoo.aggregate({$project: {ration: 1}}, {$unwind: "$ration"}, {$project: {ration: 0, _id: 0, meat: "$ration.meat", fish: "$ration.fish", water: "$ration.water", banana: "$ration.banana", bamboo: "$ration.bamboo", nuts: "$ration.nuts", dumplings: "$ration.dumplings", _id: 0}});


    as a result we get

    {
            "result" : [
                    {
                            "_id" : ObjectId("4f58e58227f86b11258dc713"),
                            "meat" : 20
                    },
                    {
                            "_id" : ObjectId("4f58e58227f86b11258dc713"),
                            "fish" : 1
                    },
                    {
                            "_id" : ObjectId("4f58e58227f86b11258dc713"),
                            "water" : 30
                    },
                    {
                            "_id" : ObjectId("4f58e5e127f86b11258dc714"),
                            "meat" : 15
                    },
                    {
                            "_id" : ObjectId("4f58e5e127f86b11258dc714"),
                            "water" : 25
                    },
                    {
                            "_id" : ObjectId("4f58e60027f86b11258dc715"),
                            "banana" : 15
                    },
                    {
                            "_id" : ObjectId("4f58e60027f86b11258dc715"),
                            "water" : 10
                    },
                    {
                            "_id" : ObjectId("4f58e60027f86b11258dc715"),
                            "nuts" : 1
                    },
                    {
                            "_id" : ObjectId("4f58e64a27f86b11258dc716"),
                            "bamboo" : 15
                    },
                    {
                            "_id" : ObjectId("4f58e64a27f86b11258dc716"),
                            "dumplings" : 50
                    },
                    {
                            "_id" : ObjectId("4f58e64a27f86b11258dc716"),
                            "water" : 3
                    }
            ],
            "ok" : 1
    }


    It remains only to add / group this whole thing using the $ group function. Indication of the _id field in the grouping is mandatory here, but we basically do not need it, so let it be some kind of nonsense. For each type of food, create an appropriate field to summarize the individual diets of each animal:

    db.zoo.aggregate({$project: {ration: 1}}, {$unwind: "$ration"}, {$project: {ration: 0, _id: 0, meat: "$ration.meat", fish: "$ration.fish", water: "$ration.water", banana: "$ration.banana", bamboo: "$ration.bamboo", nuts: "$ration.nuts", dumplings: "$ration.dumplings"}}, {$group: {_id: "s", sum_meat: {$sum: "$meat"}, sum_fish: {$sum: "$fish"}, sum_water: {$sum: "$water"}, sum_banana: {$sum: "$banana"}, sum_nuts: {$sum: "$nuts"}, sum_bamboo: {$sum: "$bamboo"}, sum_dumplings: {$sum: "$dumplings"}}});

    {
            "result" : [
                    {
                            "_id" : "s",
                            "sum_meat" : 35,
                            "sum_fish" : 1,
                            "sum_water" : 68,
                            "sum_banana" : 15,
                            "sum_nuts" : 1,
                            "sum_bamboo" : 15,
                            "sum_dumplings" : 50
                    }
            ],
            "ok" : 1
    }


    Favorite Caretaker


    Filter by fields and unwind the staff.like array:

    db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"});

    We recall that $ project can raise the field one level up:

    db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like"}});

    So we selected all the caretakers who at least someone like and someone like two animals, then he is present in the sample twice.

    {
            "result" : [
                    {
                            "name" : "Petrovich"
                    },
                    {
                            "name" : "Mihalich"
                    },
                    {
                            "name" : "Petrovich"
                    },
                    {
                            "name" : "Maria"
                    },
                    {
                            "name" : "Anna"
                    },
                    {
                            "name" : "Petrovich"
                    },
                    {
                            "name" : "Mihalich"
                    },
                    {
                            "name" : "Maria"
                    },
                    {
                            "name" : "Anna"
                    }
            ],
            "ok" : 1
    }


    Now you need to sum these fields. But it’s not so easy to do this, since we don’t have a field for summing, so we create this field with the already known chip.

    db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like", count: {$add: [1]}}});

    As a result, another count field with value 1 will be added to each object. Group and summarize:

    db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like", count: {$add: [1]}}}, {$group: {_id: "$name", num: {$sum: "$count"}}});

    Sort and limit the output to the very first element

    db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like", count: {$add: [1]}}}, {$group: {_id: "$name", num: {$sum: "$count"}}}, {$sort: {num: -1}}, {$limit: 1});

    And we get the following:

    { "result" : [ { "_id" : "Petrovich", "num" : 3 } ], "ok" : 1 }


    That's all. For those interested, there are two simple reports in English on this topic: one and two .

    To be honest, I really like MongoDB, although we used it only on a part of the project to store disparate data. The same Map / Reduce for me has always been something scary and incomprehensible, but the new thing of data aggregation allows you to partially exclude JavaScript, because somehow it is interpreted, and therefore slow and replace it with ready-made, and therefore fast, elements of the language .

    PS It is worth noting that version 2.1 is still quite crude. I constantly got all sorts of exceptions for assertion failed. But I think that in 2.2 it will finally be stable and cool.

    Also popular now: