KatoProject January 15, 2015 at 13:04

Why relational DBMSs are great for startups: An example from the development history of the Kato messenger

Recovery mode

Historically, the development of computers went along with progress in the field of database management - this was due to the fact that among the tasks solved by researchers and practitioners, the huge role was played by (and continues to play at the moment) the tasks of processing the received data, compact storage and fastest search.

Accordingly, technological progress went not only in the areas of increasing processor power, memory or reducing device sizes, but also in the field of improving data efficiency. As a result, a large number of different database management systems (DBMS) appeared.

In our product, the messenger for corporate communications Kato, PostgreSQL DBMS is used. Today we would like to recall the history of this wonderful tool and show the advantages of its use for startups in the field of information technology.

Relational Model and SQL

Three main directions of the DBMS are traditionally distinguished according to various data models on which these directions are based - network, hierarchical, and relational.

The relational data model, now the most popular and advanced of the three named, was originally developed in the late 60s of the last century by a British scientist, an employee of IBM, Edgar Codd . In 1970, he published his first work on a relational data model, A Relational Model of Data for Large Shared Data Banks .

The creator of the relational data model, Edgar Codd

Codd, proposed a set of 8 basic operations that can be performed on data, and this set laid the foundation for relational algebra. On the basis of the algebra created by Codd in the mid-70s of the last century, a programming language for working with data in relational databases called SQL, which was standardized in 1986, began to develop (among many others).

Research at Berkeley: The Emergence of Postgres

One of the first relational database management systems was the open Ingres system - it was created by researchers from Berkeley who became interested in IBM publications about their project R, the relational database, and tried to develop their own system. Ingres used a language other than SQL for querying — it was called QUEL .

Subsequently, Michael Stonebreaker, previously engaged in the creation of Ingres, together with his students at Berkeley launched a new project - Post In gres (Postgres). The new system was developed from 1986 to 1995 and used the QUEL successor, the POSTQUEL query language.

Michael Stonebreaker

Later, Stonebreaker founded several DBMS companies (examples: Illustra , purchased by Informix ; StreamBase Systems ; Vertica , purchased by HP; VoltDB ).

His students who worked on Postgres created their own version of the database in which POSTQUEL was replaced with SQL. The project was originally called Postgres95, and got its current name - PostgreSQL - after the transfer of this development by the University of California, Berkeley in the hands of a team of enthusiasts.

DBMS Issues: The Birth of NoSQL

In the late nineties and early 2000s, there was a situation in the DBMS market in which there were a considerable number of popular databases, but each of them had serious disadvantages. In the case of commercial Oracle, IBM DB2, and Microsoft SQL Server, these drawbacks were quite substantial, and the most popular free MySQL project had limited functionality (for example, stored procedures, triggers, and views only appeared in this DBMS in 2005).

At the same time, PostgreSQL, despite the fact that its developers did an enormous volume and generally very high quality work, could not boast of the high speed and ease of administration, which limited its use in commercial projects.

The problems of existing products that use SQL and the relational model in general have prompted enthusiasts to create databases that work using other standards - this is how many projects were born that can be combined into a common NoSQL category .

A number of NoSQL databases have emerged (some well-known examples: MongoDB, Redis, Riak). The development of this direction went along the path of fragmentation and the creation of highly specialized products.

DBMS and startups

The emergence of a large number of new NoSQL-developments at some point changed the attitude of startups to traditional SQL-systems - they began to be perceived by the creators of IT projects as too complex, old-fashioned and difficult to work in modern dynamic applications.

However, it gradually became clear that DBMS from the NoSQL category have the following critical (and very unpleasant) property - they are good for solving only very narrowly defined problems. This property automatically made the use of conditional MongoDB in a startup a very risky step - at the initial stage, conditional MongoDB may be an ideal choice for a given range of tasks, however, at the time when a startup changes its strategy somewhat (and it happens almost always), some other DBMS may more suitable for solving problems in the new formulation. Most likely, “moving” to this other DBMS will be too complicated and expensive operation, which a beginner’s business cannot do.

On the other hand, during the rapid development of DBMS from the NoSQL category, developers of traditional relational DBMSs were also not idle. In particular, the creators of PostgreSQL worked on the productivity, ease of administration and documentation of their project, as a result of which, at the end of the 2000s, from a "boring and incomprehensible antique tool for elderly bearded, pot-bellied and bald uncles" he turned into an accurate, fast and modern weapon, necessary in the arsenal of any “technology hipster”.

PostgreSQL and Kato messenger

The developers from the Kato team were employed in various technology companies and startups (for example, in the Rdio project ) and from their own experience felt the pros and cons of working with many existing DBMSs (and simultaneously stepped on almost all possible rakes related to building the system from scratch). As a result, starting work on our project, we chose PostgreSQL.

For commercial projects, it is very important to have good opportunities for scaling various aspects. Each project has its own aspects - for example, in Kato we need to scale the base of the message history in the rooms.

The immutability of the data schema is one of the popular advantages of NoSQL. Hstore module(by the way, made by Muscovites) from PostgreSQL allows you to write keys and values in the table columns, which eliminates the need for developers to constantly change the data scheme in the process of adding new product functionality. At the same time, it remains possible to create indexes.

PostgreSQL 9.2 introduces a new type - JSON . Unlike hstore, the JSON type supports nested structures, which makes PostgreSQL a convenient tool for working with documents. It is also important that for the jsonb type, you can create GIN indexes, which makes it possible to quickly search through JSON objects.

The implementation of hstore and the JSON type made it possible to create NoSQL-style databases within PostgreSQL tables, which allows you to use the advantages of NoSQL and SQL at the same time.

Here are a few typical operations to illustrate the capabilities of the hstore module.

Create an hstore extension and a table with a column of type hstore:

postgres=# create extension hstore;
WARNING:  => is deprecated as an operator name
DETAIL:    This name may be disallowed altogether in future versions of QL.
CREATE EXTENSION
postgres=# create table hstore_test (data hstore);
CREATE TABLE

Add the hstore entry, where the two keys are 'a' with the value 'hello' and 'b' with the value 'world':

postgres=# insert into hstore_test values (hstore(array['a', 'hello', 'b', 'world']));
INSERT 0 1
postgres=#

We look at the value of the key 'a':

postgres=# SELECT data->'a' FROM hstore_test;
 ?column?
----------
 hello
(1 row)
postgres=# ▄

Find out if the keys 'a' and 'c' exist:

postgres=# select data ? 'a', data ? 'c' from hstore_test;
 ?column? | ?column?
----------+--------------
 t        | f
(1 row)

Change the value of the key 'b':

postgres=# update hstore_test set data = data || ('b' => 'world!');
UPDATE 1
postgres=# select data->'b' from hstore_test;
 ?column?
--------------
 world!
(1 row)

All operations with the hstore type are described in the corresponding section of the PostgreSQL project documentation .

In the Kato messenger, hstore tables are used to store the settings and attributes of various objects: accounts, rooms, teams and organizations.

Rings of history

The story goes in circles, and very often the phrase “everything is new - it's well forgotten old” is true - many modern trends in the construction of a DBMS and working with data were comprehended and anticipated by the creators of the relational model and SQL developers.

PostgreSQL is a canonical example of a project that constantly incorporates the research results of leading world experts. As a result, this DBMS acts as a kind of universal designer, and startups, using its details, can very quickly create working commercial products, without fear of being stumped due to an unexpected expansion of the range of tasks.

Tags: