Building real-time web applications with RethinkDB

Original author: Slava Akhmechet
  • Transfer
From a translator: More recently, I learned about this rather interesting database and just came across a fresh article. On Habré there is almost not a word about RethinkDB, in connection with which it was decided to make this translation. Welcome to cat!

image

The RethinkDB database simplifies the development of web applications for real-time updates.

RethinkDB is an open source database for real-time applications. It has an integrated change notification system that continuously broadcasts updates for your application. Instead of constantly asking for new data, let the database itself send you the latest changes. The ability to "subscribe" to streaming updates can greatly simplify the architecture of your application and work with clients who maintain a constant connection to your server side.

RethinkDB is a schema-free repository for JSON documents, but it also supports some features of relational databases. RethinkDB also supports clustering, which makes it very easy to expand. You can configure sharding and copying through the built-in web interface. The latest version of RethinkDB also includes an automatic fail-over for clusters with three or more servers. ( Translator's note: the possibility of continuing to work with the database in the event of a fall of one of the servers is implied.)

The query language in RethinkDB, called ReQL, is natively embedded in the code in the language in which you write your application. If, for example, you code in Python, then when writing queries to the database you will use the usual syntax for Python. Each request is made up of functions that the developer puts in a chain to accurately describe the necessary operation.

A few words about ReQL
RethinkDB contains tables in which traditional JSON documents are stored. The structure of the JSON objects themselves can have deep nesting. Each document in RethinkDB has its own primary key - the “id" property with a unique value for the parent table. Referring to the primary key in your request, you can get a specific document.

Writing ReQL queries in an application is quite similar to using the SQL query constructor API. Below, in JavaScript, a simple example of a ReQL query is provided to determine the number of unique last names in the users table :

r.table("users").pluck("last_name").distinct().count()

In a ReQL query, each function of the chain works with data obtained from the previous function. To be precise, the order of execution of this request is as follows:

  • table queries a specific table in the database
  • pluck retrieves a specific property (or several properties) from each entry
  • disctinct removes duplicate values, leaving only one unique
  • count counts and returns the number of items received

Traditional CRUD operations are also simple. ReQL includes an insert function that you can use to add new JSON documents to the table:

r.table("fellowship").insert([
   { name: "Frodo", species: "hobbit" },
   { name: "Sam", species: "hobbit" },
   { name: "Merry", species: "hobbit" },
   { name: "Pippin", species: "hobbit" },
   { name: "Gandalf", species: "istar" },
   { name: "Legolas", species: "elf" },
   { name: "Gimili", species: "dwarf" },
   { name: "Aragorn", species: "human" },
   { name: "Boromir", species: "human" }
])

The filter function retrieves documents that match certain parameters:

r.table("fellowship").filter({species: "hobbit"})

You can add functions like update or delete to the chain to perform certain operations on documents returned from filter :

r.table("fellowship").filter({species: "hobbit"}).update({species: "halfling"})

ReQL includes over 100 functions that can be combined to achieve the desired result. There are functions for managing flows, changing documents, aggregating, recording, etc. There are also functions “sharpened” for performing standard operations with strings, numbers, timestamps and geospatial coordinates.

There is even an http command that can be used to retrieve data from third-party Web APIs. The following example shows how you can use http to retrieve posts from Reddit:

r.http("http://www.reddit.com/r/aww.json")("data")("children")("data").orderBy(r.desc("score")).limit(5).pluck("score", "title", "url")

After the posts are received, they are sorted by points and then certain properties of the top five posts are displayed. Using ReQL "at full capacity", developers can perform really complex data manipulations.

How ReQL works

RethinkDB client libraries (hereinafter “drivers”) are responsible for integrating ReQL into the programming language in which the application is being developed. Drivers implement functions for all kinds of queries supported by the database. ReQL expressions are regarded as structured objects that look like an abstract syntax tree . But in order to fulfill the request, the drivers translate these request objects into a special format " RethinkDB's JSON wire protocol format", which is then transferred to the database.

The run function that closes the chain translates the request, executes it on the server and returns the result. As a rule, you will transfer the connection to the server to this function so that it can perform the operation. In official drivers, working with The connection is made in the manual mode. This means that you need to create the connection and close it after the operation.

The following example shows how to execute a query in RethinkDB out of Node.js with established ReQL driver for JavaScript. This request gets all halflings (halflings) fellowship from the table and displays them in the console:

var r = require("rethinkdb");
r.connect().then(function(conn) {
return r.table("fellowship")
         .filter({species: "halfling"}).run(conn)
   .finally(function() { conn.close(); });
})
.then(function(cursor) {
return cursor.toArray();
})
.then(function(output) {
console.log("Query output:", output);
})

The rethinkdb module provides access and use of the RethinkDB drivers. You can use this module to compile and send database queries. The above example uses promises for asynchronous flow control, however, drivers also support working with regular callbacks.

The connect method establishes a connection, which is then used by the run function . to complete the request. The query itself returns a cursor, which is something like an open window to the contents of the database. Cursors support lazy fetching and offer effective ways to iterate over large amounts of data. In the example above, I just decided to convert the contents of the cursor to an array, since the size of the result is relatively small.

Despite the fact that ReQL queries are written in your application as regular code, they are executed on the database server and return the results. Integration is so seamless that beginners often get confused where in the code is the boundary between their application and working with the database.

ReQL chains and integration into various languages ​​greatly increase the ability to reuse code and separate frequent operations. Since queries are written in the application language, encapsulating query subexpressions into variables and functions becomes very simple and convenient. For example, this JavaScript function generalizes pagination by returning a ReQL expression that will already contain the specified values:

function paginate(table, index, limit, last) {
   return (!last ? table : table
      .between(last, null, {leftBound: "open", index: index}))
   .orderBy({index: index}).limit(limit)
}

Another noteworthy advantage of ReQL is that it also offers work with familiar SQL and is well insured against regular injection attacks. You can easily include external data in your queries without worrying about the need for risky string concatenation.

Many of the more advanced features of ReQL, such as secondary indexes, joining tables, and using anonymous functions, remain outside the scope of this article. However, if you wish, you can familiarize yourself with them on the ReQL API documentation page .

Creating real-time web applications using changefeeds

RethinkDB has a built-in change notification system, which greatly simplifies the development of real-time applications . If you insert the changes functionat the end of the chain, a continuous stream will be launched as a result of the request, reflecting all the changes that occur. Such flows are called changefeeds (hereinafter referred to as “changfeed”).

Our usual database queries are well suited to the traditional request / response web model. However, constantly polling the server is not practical for real-time applications that use a constant connection to the server or streaming data. Chengfids provide an alternative to regular polling, namely the ability to constantly submit updated results to the application.

You can attach the changelog directly to the table to track any changes to its contents. You can also use Chengfids with more complex queries to receive updates of only the data you need. For example, you can attach a changelog to a query that uses the orderBy and limit functions to create a dynamic high score table for a multiplayer game:

r.table("players").orderBy({index: r.desc("score")}).limit(5).changes()

Players are sorted by points and then the first five is displayed. As soon as there are any changes in this top five, the Chengfid will send you updated data. Even if a player who was not originally in the TOP 5 scores enough points and forces the other player out of the top five, the Chengfid will report this and transfer all the necessary data to update the list.

Chengfid sends not only the new value of the record, but the previous one, allowing us to compare the results. If any of the entries is deleted, then its new value will be null . Just like for a new record that just appeared, the old value will be null . By the way, you can add other operations to the chain after changesif any manipulations with the incoming data are necessary.

When you execute a request with the changes command , the cursor will be returned, which will remain open forever (remember the window, right?). The cursor will display new changes as soon as they become available. Below you can see an example showing how you can get updates from the Chengfid in Node.js:

r.connect().then(function(conn) {
   return r.table("data").changes().run(conn);
})
.then(function(cursor) {
   cursor.each(function(err, item) {
         console.log(item);
   });
});

The Chengfid cursor is running in the background, which means your application is not blocked. In primordially asynchronous environments, such as Node.js, you do not need to take any additional measures for the correct operation. If you work with other languages, you will probably need to install frameworks for asynchronous code, or manual implementation of threads. The official RethinkDB drivers for Python and Ruby support such popular and widely used frameworks as Tornado and EventMachine.

Currently, the changes command works with the get, between, filter, map, orderBy, min, and max functions . Support for other types of queries is planned for the future.

When creating a real-time web application using RethinkDB, you can use WebSockets to broadcast updates to the front-end. And libraries like Socket.io are easy to use and simplify this process.

Chengfids are especially useful for applications designed for horizontal expansion. When you distribute the load between multiple instances of your application, you usually resort to using additional mechanisms, such as message queues or in-memory db, to distribute updates to all servers. RethinkDB takes this functionality to the level of your application, flatteningits architecture and eliminating the need for additional infrastructure. Each application instance connects directly to the database to receive new changes. Once updates are available, each server broadcasts them to the corresponding WebSocket clients.

In addition to real-time applications, Chengfid can greatly simplify the implementation of mobile push notifications and other similar functionality. Chengfids represent an event-oriented model of interaction with the database, and this model is useful in many cases.

RethinkDB Scaling and Cluster Management

RethinkDB is a distributed database aimed at clustering and simple expansion. To add a new server to the cluster, simply start it from the command line with the --join option and specify the address of an existing server. If you have a cluster with several servers at your disposal, you can configure sharding and copying individually for each table. Any settings and features that work on one database instance will work exactly on the cluster as well.

The RethinkDB server also includes a web admin interface, which you can open directly in the browser. Using this interface, you can easily manage and monitor the operation of the cluster. You can even set up sharding and copying in a few clicks.

RethinkDB allows you to apply the ReQL approach to cluster configuration, which is ideal for fine tuning and automation. ReQL includes a simple reconfigure function that you can bind to a table to set sharding settings. The cluster also provides most of the internal information about its state and settings through a set of special tables in RethinkDB. You can query system tables to change settings or receive monitoring information. Almost all the functionality provided through the web interface is built on the ReQL API.

You can even use changelids in conjunction with the ReQL monitoring API to receive a stream of server data. For example, you could create your own monitoring tool that attaches a changelog to a system table with statistics and in real time transfers data to build a graph of the read / write load.

RethinkDB 2.1, released recently, has built-in support for automatic fail-over'a. New functionality improves cluster availability and reduces the risk of database server crashes. If the primary server is faulty, then the rest, secondary working servers “choose” a new primary, which will perform this role until the failed server starts up or is removed from the cluster.
Iron breakdowns or network outages no longer affect data availability until most of the servers are online.

Installing RethinkDB

RethinkDB runs on Linux and MacOS X. The Windows version is under active development and is not yet available for download. The RethinkDB documentation details the installation process. We have prepared APT and Yum repositories for Linux users, as well as an installer for OS X. You can also install RethinkDB using Docker or compile the source code from Github . To understand this, our 10-minute instruction will help you .

Original: link

Also popular now: