ru_vds March 27, 2017 at 14:15

SQL or NoSQL - that's the question

Transfer

We all know that in the world of database technologies there are two main areas: SQL and NoSQL, relational and non-relational databases. The differences between them are how they are designed, what types of data support, how they store information.

Relational databases store structured data that typically represents real-world objects. Say, this can be information about a person, or about the contents of the basket for goods in the store, grouped in tables, the format of which was set at the design stage of the warehouse.

Non-relational databases are structured differently. For example, document-oriented databases store information in the form of hierarchical data structures. We can talk about objects with an arbitrary set of attributes. The fact that a relational database will be divided into several interrelated tables can be stored in a non-relational database as an integral entity.

The internal structure of various database management systems affects the features of working with them. For example, non-relational databases are better scalable. What technology to choose? The answer to this question depends on the features of the project in question.

About choosing SQL databases

There are no databases that are suitable for absolutely everyone. That is why many companies use both relational and non-relational databases to solve various problems. Although NoSQL databases have become popular due to their speed and scalability, structured SQL repositories may be preferable in some situations. Here are two reasons that may serve as a reason for choosing an SQL database:

The need for the database to meet ACID requirements (Atomicity, Consistency, Isolation, Durability - atomicity, consistency, isolation, durability). This reduces the likelihood of unexpected system behavior and ensures database integrity. This is achieved by rigorously determining exactly how transactions interact with the database. This differs from the approach used in NoSQL databases, which focus on flexibility and speed, rather than 100% data integrity.
The data you work with is structured, and the structure is not subject to frequent changes. If your organization is not in the stage of exponential growth, there are probably no convincing reasons to use a database that allows you to freely manipulate data types and is aimed at processing huge amounts of information.

About choosing NoSQL databases

If there is a suspicion that the database may become the bottleneck of a certain project based on working with large volumes of information, it is worth looking in the direction of NoSQL databases that allow something that relational databases do not know how to.

Here are the features that have become the reason for the popularity of NoSQL databases such as MongoDB, CouchDB, Cassandra, HBase:

Storage of large volumes of unstructured information. The NoSQL database does not impose restrictions on the types of stored data. Moreover, if necessary, in the process, you can add new data types.
Using cloud computing and storage. Cloud storage is a great solution, but it requires data to be easily distributed across multiple servers for scalability. Using, for testing and development, local equipment, and then moving the system to the cloud, where it works, is exactly what NoSQL databases are created for.
Fast development. If you are developing a system using agile methods, using a relational database can slow down the work. NoSQL databases do not need the same amount of preparatory actions that are usually needed for relational databases.

In the next section, we’ll look at some of the differences between SQL and NoSQL. Namely, first we look at a simple example showing the fundamental difference between the two approaches to organizing databases, then we'll talk about scalability and indexing of data. In the end, let us dwell on the example of a large CRM system that needs high performance data warehouse.

SQL and NoSQL

Let's start with some key concepts of relational and non-relational databases. The following is a database of people’s relationships. Variant a is a diagram-free structure constructed in the form of a graph characteristic of NoSQL solutions. Option b shows how the same data can be represented in a structured form typical of SQL.

Two options for presenting data.

Schemeless means that two documents in a NoSQL data structure do not have to have the same fields and can store data of different types. Here, for example, is an array of objects whose set of fields does not match.

var cars = [
{ Model: "BMW", Color: "Red", Manufactured: 2016 },
{ Model: "Mercedes", Type: "Coupe", Color: "Black", Manufactured: "1-1-2017" }
];

In a relational approach, data must be stored in a pre-designed structure from which this data can then be retrieved. For example, using an operator JOINwhen fetching from two tables:

SELECT Orders.OrderID, Customers.Name, Orders.Date
FROM Orders
INNER JOIN Customers
ON Orders.CustID = Customers.CustID

As a more advanced example, to demonstrate when SQL is preferable to NoSQL, we consider the features of applying compression algorithms in NoSQL databases . The problem is that in some NoSQL databases (for example, in CouchDB and HBase) you constantly have to form the so-calledsstables- string tables in key-value format, sorted by key. In such tables that are saved to disk, data falls from tables stored in memory when they are full and in other situations. With intensive work with the database, creating tables over time leads to the fact that the input / output subsystem of the data storage device becomes a bottleneck for data reading operations. As a result, reading in a NoSQL database is slower than writing, which negates one of the main advantages of non-relational databases. It is in order to reduce this effect that NoSQL systems use, in the background, data compression algorithms, trying to combine many tables into one. But this operation in itself is very resource intensive, the system works under increased load.

Scalability

One of the main differences between the technologies under consideration is that NoSQL databases are better scalable. For example, MongoDB has built-in support for replication and sharding (horizontal data sharing) for scalability. Although scaling is supported in SQL databases, it requires much more human and hardware resources.

Data Warehouse Type	Use case	Example	Recommendations
Key-Value Storage	Suitable for simple applications, with one type of object, in situations where the search for objects is performed by only one attribute.	Interactively updating the user's homepage on Facebook.	Familiarity with memcached technology is recommended. If you have to search for objects by multiple attributes, consider moving to a document-oriented repository.
Document Oriented Storage	Suitable for storing objects of various types.	A transport application that operates on data about drivers and cars, working with which you need to search for objects in different fields, for example, the name or date of birth of the driver, license number, and the vehicle that he owns.	Suitable for applications in the course of work with which the implementation of the principle of “consistency in the final analysis” with limited atomicity and isolation is allowed. A quorum reading mechanism is recommended to ensure timely atomic consistency.
Extensible Record Storage	Higher throughput and better parallel processing at the cost of slightly higher complexity than document-centric storage.	Applications similar to eBay. Vertical and horizontal separation of data to store customer information.	To simplify data separation, HBase or Hypertable are used.
Scalable RDBMS	The use of ACID semantics frees programmers from the need to work at a sufficiently low level, namely, to be responsible for data blocking and consistency, to process obsolete data, collisions.	Applications that do not require updates or data merges spanning multiple sites.	It is worth paying attention to such systems as MySQL Cluster, VoltDB, Clustrix, focused on improved scaling.

A more detailed comparison of SQL and NoSQL can be found in this article. Here are its main points. Namely, three main characteristics of the systems were tested: parallel data processing, work with information storages, data replication. Parallel processing capabilities were evaluated by analyzing locking mechanisms, concurrent access control based on multi-versioning, and ACID. Storage testing covered both physical media and storage using RAM. Replication was tested in synchronous and asynchronous modes.

Using the data obtained during the tests, the authors conclude that SQL databases with the possibility of clustering showed promising performance results per node, and in addition, they have scalability, which gives RDBMS systems an advantage over NoSQL due to the full compliance with ACID principles.

Indexing

RDBMS systems use indexing to speed up data retrieval from databases. The absence of an index means that the whole table must be scanned in order to fulfill a read request.

In both SQL and NoSQL databases, indexes serve the same purpose - to speed up and optimize data extraction. But the way they work is different due to different database architectures and the peculiarities of storing information in the database. While SQL indexes are represented as B-trees that reflect the hierarchical structure of relational data, in NoSQL databases they point to documents, or to parts of documents, between which, basically, there is no relationship. Here is the detailed material on this topic.

CRM systems

CRM applications are one of the best examples of systems that are characterized by huge volumes of daily processed data and a very large number of transactions. All developers of such applications use both SQL and NoSQL databases. And, although most of the transaction data is still stored in SQL databases, publicly available systems of the DBaaS class (data-base-as-a-service, database as a service), like AWS DynamoDB and Azure DocumentDB, are used, as a result, a serious load data processing can be transferred to cloud NoSQL-database.

While the use of such services frees the developer from solving storage maintenance tasks, this is also the area where NoSQL databases are used for what they were mainly created for, for example, for data mining. The amount of information stored in the huge CRM systems of financial and telecommunication companies would be practically impossible to analyze using tools like SAS or R. This would require huge hardware resources.

The main advantage of such systems is the use of unstructureddata similar to documents. Such data can be fed into the input of statistical models, which enable companies to perform various types of analysis. CRM applications, in addition, are a very good example in which two database systems are not competitors, but exist in harmony, playing each role in a large data management architecture.

Summary

When searching for a database management system, you can choose one technology, and later, specifying the requirements, switch to something else. However, smart planning will save you a lot of time and money.

Here are the signs of projects for which SQL databases are ideal:

There are logical data requirements that can be predefined.
Data integrity is very important.
We need a technology based on well-established standards that has proven itself, using which we can rely on extensive experience of developers and technical support.

And here are the properties of projects for which something from the scope of NoSQL is suitable:

Data requirements are fuzzy, vague, or developing with the development of the project.
The purpose of the project can be adjusted over time, while the ability to immediately begin development is important.
One of the basic requirements for a database is data processing speed and scalability.

In the end, I want to say that in the modern world there is no confrontation between relational and non-relational databases. Instead, it’s worth talking about their joint use to solve problems in which a particular technology performs best. In addition, the integration of these technologies into each other is increasingly observed. For example, Microsoft, Oracle, and Teradata now offer some form of Hadoop integration for connecting SQL-based analytic tools to the world of unstructured big data.

Dear readers, did you have to choose database management systems for your own projects? If so, please share your experience, tell us what and why you ultimately chose.

Tags: