ewgRa December 14, 2012 at 14:32

MongoDB for Developers and DBA

MongoDB courses for developers and database architects from 10gen , a MongoDB developer company, are ending .
The final exam was sent for verification and I would like to share my impressions of the course and the information received, to talk about the pros and cons of MongoDB.

General impression of the course

I signed up for two courses at once, for DBA and for developers. In general, the load is not too big, it took 3-4 hours a week to watch the video and 1-2 hours for homework at a very leisurely pace. If desired, I think the time costs can be reduced by half.
In general, the impressions are positive. Despite the fact that most of the information presented in the courses can be gleaned from the official documentation, it’s much more interesting to master the material through the courses. I had heard about MongoDB before and “felt” what it was a couple of times, but after the courses a deeper understanding of the capabilities and scope of this database appears. At first there were some overlays. Several points were related to the wording of the questions, which implied ambiguous answers, but perhaps it was a problem with understanding the English language. Then there were a couple of problems with zeroing the results, and some overlays due to Hurricane Sandy. There was a fun question, which I called “Mission Impossible”, in which I had to choose one of three possible answers. In this case, usually three attempts are given to answer the question.

Strengths

Replication ( manual )

Replication is easy to configure: the servers are started with the name of the replica, the config is configured, the replica participants select the PRIMARY server, the rest become SECONDARY servers. In this case, you can set priorities for each server, you can generally prevent the server from becoming PRIMARY. The PRIMARY server allows you to write / read, from SECONDARY servers you can only read.
When a server crashes, there are many nuances, depending on whether the SECONDARY or PRIMARY server crashed, whether there was a record in the PRIMARY server after it became unavailable and a new PRIMARY server was selected, etc. But in general, in most cases, replica recovery is automated and will not require manual intervention, except to raise a fallen server.

Sharding ( manual )

If you have a large amount of data in the collection and it does not fit on one server - it can be distributed across several servers, while working with this collection at the application level will not require changes. A key is selected for sharding, ranges are created for this shard for each shard. Moreover, if I remember correctly, resharding when changing ranges occurs automatically in hot mode. At the same time, there is a nuance, after the collection was distributed across the servers, it is impossible to change the key for sharding in automatic mode.

Geographical Indices ( manual )

Now many startups or social services. Networks use a functional of the form: find something no further than X km from the user. Here for such functionality in MongoDB you can use geographic indexes.

Schemaless

In my opinion, the absence of a schema in MongoDB allows you to speed up the development of the project. There is no need to work out the database structure in detail at the initial stage, take care of the implementation of relationships, then when the project expands, becomes more detailed, develop a data migration plan, etc. In MongoDB, on the contrary, at the “prototype” stage, you can quickly launch a project, arranging “confusion and reeling” in the database, then when the project begins to grow, it takes shape in something more tangible and it becomes necessary to bring the database into a more normalized form.

Capped Collections ( manual )

When creating such collections, you must specify the number of records that can be stored in the collection. When pasting, if the collection is already full, the oldest record will be rewritten. It can be compared to writing clockwise on a circle in which there is a certain number of segments. A useful collection, if you need to store the latest, relevant information, and old information is not of interest.

Aggregation framework ( manual )

Using this framework, you can form samples from the source data with grouping, summing, counting records, etc. In essence, this is an implementation of GROUP BY, COUNT, HAVING, etc. constructions in SQL. The source data passes through an array of so-called pipe, which transform the data and transfer it to the next pipe. Very similar to console commands like: "cat file | grep boobs | grep -v small. "

Map-Reduce ( manual )

If the capabilities of the Aggregation Framework are not enough, you can use MapReduce functionality. Data is supplied to the map input of the function, the functions are converted and fed to the input of the reduce function.

Underwater rocks

Compound Index Constraint

If you have an entry like: {a: [1, 2], b: [1, 2]} - create index {a: 1, b: 1} will fail. Actually, just like inserting a similar record with fields that are indexed. Read more here , look for “Compound Multikey Indexes May Only Include One Array Field”

Sparse index and uniqueness ( manual , Sparse Indexes)

Let's say we have entries in the collection:
{"_id": ObjectId ("50caeec479705c3852e9e61b"), "a": "1"}
{"_id": ObjectId ("50caeeeb79705c3852e9e61d"), "a": "2", 'b ': 1}
{"_id": ObjectId ("50caefb179705c3852e9e621"), "a": "3"}
and we want the "b" property of the documents to be unique. It will not work to create a regular unique index, the first and third record will be considered to be b: null and this violates the uniqueness.
But we can create a unique sparse index, and then records that do not have a “b” will not be included in this index. It would seem that everything is fine, an index has been created, there is uniqueness. But! If we allow, we will ask to select all the records from our collection and ask them to sort by field b, MongoDB uses the sparse index we created, in which there are no records without "b". As a result, we get only one record at the output.

Application Interface Dependence

The course has repeatedly noted that in MongoDB it is convenient to store documents, as they are used for output. Let's say you have a blog, there are comments. The author’s name and email are displayed next to the comment. Conveniently in the object that stores the comment also store information about the author. Accordingly, if you change something in this regard, there is a possibility that it will be necessary to change the data storage location. In principle, this is not quite a pitfall and the likelihood of such a development is small, but I did not particularly like something in this statement.

You can not change shard key

After the collection has been posted, changing the key will not work automatically. Therefore, choosing a key is a very important operation. More details docs.mongodb.org/manual/faq/sharding/#faq-change-shard-key

No transactions within multiple documents

The operation on one document is atomic, but for several documents it is proposed to use transactions at the application level docs.mongodb.org/manual/tutorial/perform-two-phase-commits

Conclusion

Great courses. MongoDB captivates with its flexibility and simplicity, its use in various situations is very, very justified.
If anyone has a desire to take courses, on January 21, the courses will start again. Also on February 25, courses for Java developers will start. https://education.10gen.com/

Tags: