Microsoft DocumentDB: Article One, Introduction

    In August, we launched a large number of new things on Microsoft Azure ( prooflink ), and quite naturally one of the most interesting for our audience was the Document NoSQL Database service called DocumentDB. The time has come, and we begin to write about it - the first article, as usual, is an introduction.

    • What it is? Basic concepts.
    • How to create an account?

    What is Azure DocumentDB?

    In the modern world, applications constantly produce and not always, but consume, a large amount of data. Data, like the application, mimics over time, and the data scheme changes with it, which periodically leads to the idea that schema-free NoSQL databases are well suited for such scenarios - a quick, simple and customizable solution. However, many of these technologies do not allow complex queries and processes affecting transactions to be performed, which complicates the management of non-trivial models.

    Microsoft Azure DocumentDB- This is a document-oriented, NoSQL database, specially designed for applications on the web and on mobile devices - with guaranteed fast read and write operations, the flexibility of the scheme and the ability to quickly deploy and scale the database up and down. DocumentDB also has complex queries involving an SQL dialect, JavaScript support, transaction processing with many documents, and much more. By default, DocumentDB supports JSON schema operations, and deep JavaScript integration helps you execute business logic right inside the engine with transactional considerations.

    Azure DocumentDB is:

    • Queries with SQL syntax: storing heterogeneous JSON documents inside DocumentDB, queries for them.
    • Highly-concurrent, lock-free indexing technology with automatic indexing of document content (accordingly, without the need to specify schema hints, secondary indexes or views).
    • JavaScript inside the database is logic as stored procedures, triggers and UDF, which means that you can put logic on top of JSON without the risk of getting out of sync between the application and the database schema.
    • Full-fledged transactional execution of logic in JavaScript in the engine (INSERT, REPLACE, DELETE, SELECT in JavaScript as an isolated transaction)
    • Four customizable consistency levels - Strong, Bounded-Staleness, Session, and Eventual.
    • Full manageability: there is no need to manage the database and resources of the machine, since DocumentDB is provided as a service. Each database is automatically backed up and protected from regional errors.
    • Easily scale through storage and bandwidth units.

    Azure DocumentDB Resources

    In Azure DocumentDB, data is replicated and addressed by URIs - simple RESTful access is set for all resources. You have an account for the database, and it is a unique global namespace. All resources inside the space are stored in JSON documents with metadata and collections of things. The picture shows the relationship between DocumentDB resources.

    An account consists of a pack of databases, each of which consists of several collections, each of which contains stored procedures, triggers, UDF, documents and related attachments. The database can be assigned to users with specific permissions to access collections, stored procedures, triggers, UDF, documents, etc.

    Development with Azure DocumentDB

    Since Azure DocumentDB exposes operations with resources with the REST API, requests can be performed with any language that can HTTP / HTTPS. There are special libraries for several languages ​​that simplify working with DocumentDB:

    JavaScript transactions and execution

    As already written, in Azure DocumentDB you can write logic in the form of JavaScript, "programs" are then registered on the collection and support document operations within these collections. An application on JS can be registered for execution for triggers, stored procedures and UDFs, triggers and stored procedures can use CRUD, while UDFs do not have write access. All JS logic runs inside an ambient ACID transaction with snapshot isolation, and the JS logic is considered to be a modern replacement for T-SQL. If during execution JS throws an exception, then the whole transaction is rolled back.

    Let's look at an example! MSN is a huge portal that visits half a billion users per month. Hence the need for a large, scalable distributed storage with a free circuit. At some point, the development team decided to transfer everything to Azure and create a unified distributed User Data Store storage system with the following requirements:

    1. Scaling up to +425 million unique users +100 million users already authenticated in the system
    2. 20 terabytes of storage
    3. Recording latency - up to 15 ms
    4. The lack of a fixed circuit
    5. Transaction support
    6. Hadoop analytics on top of data
    7. Geographical Distribution and Availability
    The choice fell on Azure DocumentDB. One part of the system, Health and Fitness, consists of the following components:
    • Diet Tracker : daily diet monitoring - each entry contains data on calories, fats, protein, etc.
    • Exercise Tracker : exercise monitoring.
    • GPS Tracker : GPS tracking. Metadata about what is happening is stored in DocumentDB.
    • Pedometer : Steps.
    • Weight Tracker . Weight.
    • Analysis : Historical data on diet, exercise, GPS, etc.
    • Favorites and custom : bookmarks for your favorite food, exercises, metadata, etc.

    The new MSN portal stores user data in DocumentDB with 150 units of bandwidth with SSD and three geographic regions.

    The size of documents varies from 1 to 10 kilobytes, and do not have any general scheme. Most collections are tuned in such a way as to give optimal throughput values, minimal indexing overhead.

    UDS distributes user information into collections; each user's data is stored in documents. In the process there is a horizontal scaling and distribution by user ID.

    Create a DocumentDB Account

    We go to the new Microsoft Azure management portal.
    Click New -> DocumentDB Account.

    Or you can do the same by going to the “Data, storage, + backup” category and selecting DocumentDBand.

    In New DocumentDB (Preview), select the desired configuration.

    In Name, enter the name - it will be used in the addressing of the account (host). Pricing Tier cannot be set yet, because the functionality is in the preview and only one payment mode is available (more about prices here ) .In the optional settings, you can specify the capacity that will be allocated for the account - it is measured in units, adding or removing that can quickly scale the solution (the unit consists of a weighted amount of storage and bandwidth and 1 unit is set by default for the account). Read more about performance and bandwidth here .

    Creating an account takes several minutes.


    Account created and ready to use. The default consistency mode is set in Session.

    You can see what happens with DocumentDB accounts in the Browse window.

    Total - we looked at what DocumentDB is, at the basic concepts of the service, at an example of use, and created an account. In the next part - more about concepts and use.

    useful links

    Also popular now: