Postgres Shrugs

    From February 6 to 7, PGCONF.RUSSIA 2015 conference will be held at the Digital October business center in Moscow, one of the organizers of which I am. PostgreSQL is one of the most promising modern freely distributed DBMSs, actively developing and in many cases no longer inferior to the flagship of commercial Oracle DBMSs, but in some ways surpasses it. Moreover, Postgres is distributed under a very free license, close to BSD and MIT-licenses, which allows you to do anything with it - even sell on your own behalf. Therefore, there are no obstacles to creating commercial DBMSs and application systems based on postgres, and many use it. This, in turn, makes it possible to participate in the development of more people, and actively fueled by new ideas. On the pages of this post we will talk about how this DBMS originated and developed, what are its strengths and weaknesses, including from the point of view of wide distribution.

    What is Postgres

    There are many database management systems in the world. The most famous, of course, is Oracle, today it is the universally recognized DBMS No. 1 in the world. Of the free databases, MySQL is the most famous and popular. However, in recent years, PostgreSQL, the freely distributed object-relational database management system (ORDBMS), the most developed of the open database systems, has been causing increasing interest in the world. The growth of migration requests from proprietary systems to PostgreSQL proves that it is a real alternative to commercial DBMS.

    What is the reason for the growing popularity of Postgres? First of all, it is very carefully thought out, flexible and due to this has a number of advantages.

    Postgres was created in 1986 by the professor of the University of California at Berkeley, Michael Stonebreaker, as the next step after the Ingres DBMS. The foundations of its device and mechanisms for ensuring extensibility were laid already then. Interestingly, at that time, the use of the SQL language was not yet considered mandatory for relational databases. And when in 1996 Postgres, having already learned the SQL language, became a freely distributed product, it became called the intricate word PostgreSQL ("postgres ескquel"), but for the convenience of pronunciation, the old name "postgres" is also preserved and used.

    In recent years, PostgreSQL has experienced rapid growth due to its penetration into the markets of industrial solutions. Therefore, the capabilities necessary for use in serious projects are purposefully added to it - replication, means for storing unstructured data, row level security. Performance increases with each new version.

    PostgreSQL shows great interest abroad. Many companies provide commercial support and customization services - in particular, the American company EnterpriseDB, the British The Second Quandrant and the French Dalibo. By the way, the director of the latter, Jean-Paul Argudo, in his reportat PGCONF.RUSSIA 2015 will share its experience in providing commercial services regarding a free redistributable product and supporting French government agencies in the migration from Oracle to Postgres. Also at PGCONF.RUSSIA 2015, quite famous PostgreSQL developers from Japan and China will speak, who will talk about the reasons for its high popularity in these countries.

    The most important performance characteristics of PostgreSQL

    • Reliability and stability. Reliability PostgreSQL is a well-known fact, proven by the example of many projects in which PostgreSQL has been working without failures under high loads for several years.
    • Excellent support. The PostgreSQL community provides expert and fast help. Commercial companies offer their services worldwide.
    • Competitive work under heavy load. PostgreSQL utilizes multi-versioning ( MVCC ) to provide reliable and fast performance in competitive conditions under high load.
    • Scalability. PostgreSQL makes excellent use of the modern architecture of multi-core processors - its performance grows linearly up to 64 cores. Postgres-XC, Postgres-XL clustered solutions help with horizontal scalability.
    • Cross-platform. PostgreSQL works under all kinds of UNIX-like systems, including Linux, FreeBSD, Solaris, HP / UX, Mac OS X, and also under MS Windows.
    • Extensibility. PostgreSQL source codes are available, which makes it possible to add new functionality to your project without additional problems. Extensibility PostgreSQL allows you to create new data types and access methods.
    • Availability. PostgreSQL is distributed under a license close to BSD, which does not impose any restrictions on commercial use and does not require royalties.

    PostgreSQL Technical Details

    • High compliance ANSI SQL 92, ANSI SQL 99 and ANSI SQL 2003, 2011.
    • Interfaces for Tcl, Perl, C, C ++, PHP, JSON, ODBC, JDBC, Embedded SQL in C, Python, Ruby, Java and others.
    • Integration of data protection with the operating system (SE-Linux).
    • Views, sequences, inheritance, outer joins, subqueries, referential integrity, window functions, CTE (recursive queries).
    • User functions, stored procedures, triggers.
    • Procedural languages ​​PL / PgSQL, PL / Perl, PL / Python, PL / Java and others.
    • An extensible set of data types with index support (GiST, GIN, SP-GiST).
    • Built-in full-text search system with support for all European languages.
    • Native support for poorly structured data (xml, json, jsonb) with index support.
    • Hot backup and replication (synchronous, asynchronous, cascaded), PITR, bidirectional (BDR).
    • Full ACID support, isolation levels, efficient transaction serialization.
    • Functional and partial indexes.
    • Internationalization, support for Unicode and locale.
    • Downloadable extensions, e.g. PostGIS, hstore.
    • Support for SSL and Kerberos authentication.
    • Foreign Data Wrappers (writable), support for all major databases.

    Postgres in Russia

    In the wake of the current trend towards import substitution at Postgres, many state and near-state structures that today use Oracle DBMS and other proprietary systems have begun to pay attention. Specific migration steps are being taken by a number of ministries and state corporations. By the way, Europe has the same picture as ours: they also want there, oddly enough, to “get off” proprietary solutions. And they want a very long time. But many freely distributed solutions are still only approaching such a level of quality so that you can switch to them. PostgreSQL is one of the most serious systems.

    Russian developers are making a significant contribution to the development of Postgres. This is probably one of the reasons why Postgres has been so seriously considered in various ministries lately. The power structures, including the military, did not ignore him. Indeed, since it is an open source system, it is possible to check its code for various spy and sabotage bookmarks.

    Thanks to index support for working with spatial data, PostgreSQL is very widely used in geographic information systems. In particular, famous projects such as OpenStreetMap and Russian 2GIS were made on it. By the way, Postgres is at the heart of the Avito ad space.

    Unfortunately, in many cases, the openness of the system and the absence of an explicit developer company interferes with active distribution. After all, despite its free, like any other complex system, Postgres has a non-zero cost of ownership, requiring certain development and maintenance costs. We need qualified and competent specialists. And, if in the case of the same Oracle or Microsoft, there is a formal certification system, then Postgres does not have such a system. Therefore, organizations and government agencies that are accustomed or forced to work strictly formally do not understand who to hire. Well, they put Postgres. Where will they get certified engineers? In Russia, they are not formally. So now the idea is in the air that the Russian community of developers needs to acquire or create some kind of formal structure that will deal with certification, training and promotion of Postgres in Russia. Of course, there are private firms involved in postgres training. But in order for the state to recognize the certificates issued by them, support is required at a sufficiently high level. We need powerful support from potential customers, we need political will, PR. Perhaps in the future this will be realized. need political will, PR. Perhaps in the future this will be realized. need political will, PR. Perhaps in the future this will be realized.

    Who can recommend Postgres

    For any new business, Postgres is tempting compared to proprietary DBMSs with the absence of starting investments - you do not need to buy a license. However, in these cases, the more popular MySQL is often chosen. The advantages of Postgres become more visible and even decisive when you need a database of a more or less complex structure with internal logic, replication, interaction with other databases, that is, when it comes not to a multi-page website, but to an information system for business, large Internet portal, e-commerce system.

    Postgres can be recommended for those projects where high reliability is important, the ability to update the configuration of the database itself, stored procedures without stopping the system. In new versions of Postgres it will be possible (so far using the ingenious DBA) to even update the version of the postgres itself without stopping the service.

    Compared to other freely distributed relational DBMSs, Rostgres wins technologically. Initially, very competent technical solutions were put in it, ahead of time. Postgres has the highest degree of flexibility and extensibility: you can create your own data types, your own types of indexes, write built-in functions in a huge number of languages, which is nowhere else. For example, the dominant position of Postgres in GIS is a direct consequence of its flexibility and extensibility.

    Why do we organize PGCONF.RUSSIA 2015

    We hope that our conference will allow the community to organize, find some form of interaction, and create new structures. We also hope to demonstrate to the large domestic customers and the state the maturity of PostgreSQL itself and the community, and to the community members to demonstrate the breadth of the tasks open to them.

    We want to help build a community and market, bring together different stakeholders. Those who develop themselves Postgres, who develop on it, who use it. After all, if the market for services related to Postgres develops, then Postgres itself will develop.

    In general, such conferences are one of the forms of self-organization of the global community of developers and users. There are a series of international conferences that take place in Canada; American conferences are held in the USA; in various cities of Europe - annual pan-European. China and Japan also hold their national Postgres conferences. European conferences usually bring together 200-300 people. The first major Russian conference was organized by enthusiasts last summer in St. Petersburg and was attended by about 250 people. About the same we expect at PGCONF.RUSSIA 2015.

    In Russia, unlike many other countries, there are members of the international PostgreSQL kernel development team, that is, the key, fundamental developers of Postgres. One of them, Fedor Sigaev, works for us at Mail.Ru Group. At one time, he created important tools for Postgres that allowed working with arrays and unstructured data . Together with Oleg Bartunov, another Russian kernel developer, he created a full-text search module .

    Fedor will also speak at our conference (more than 40 speakers in three parallel sections have been announced in total). As an extended announcement, we want to talk a little about the contents of his report.

    PostgreSQL Extensibility - Issues and Prospects

    Historically, Postgres was created as a database that does not have a hard-wired set of rules, hard-wired types. It allows you to add your own types, create your own functions, etc. Of course, not everything was managed to be done right away; many of the ideas laid down are still developing. Say, only in the mid-2000s did Postgres finally get rid of the “knowledge” of the semantic meaning of operation signs. Then this knowledge was generalized, and now it is possible to explain to the Postgres core how to speed up the search by expressions with any signs of operations using the index. This made it possible to use a special type of indexes to speed up the search by the entry of an element into a set (array), by the intersection of sets, and then a full-text search.

    At one time, full-text search was born as a module consisting of plug-in types and plug-in indexes. There are many such extensions. You can mention LTree - this is a module and type that allows you to store trees in the database and work with them efficiently enough.

    So, Postgres allows you to create your own types and functions. In the report, Fedor will talk about how to do this best. The easiest way is to create a new type: for this it is enough to write two functions that turn a simple text string into an internal representation and vice versa, and run these functions in life with a special SQL command. With this, Postgres can already live and replicate. But in order to get some sense from the created data type, you will need to define some functions and operations on them. If you need index support for searching for some operation, you will also need to write several functions.

    There are operations on data types that are not too trivial, such as the proximity (similarity) of two lines. For example, the search task is to find a “black hole” in the database with the phrase “black hole”. In fact, it is not so easy to eliminate the alternation of terms without losing meaning, and at the same time find the necessary lines (and there are still typos!). No new types are introduced here; instead, two lines are compared, given Levenshtein’s similarity measure. Those who tried to work with her know that searching on it works only exhaustive search. Therefore, at one time, the pg_trgm module was created , which represents a string in the form of a set of trigrams, indexing these sets and searching quite efficiently.

    The report will also talk about the possibility of using this module for index search on some subsets of regular expressions - about how you can do something completely unusual with strings, using the existing Postgres features. Separately, foreign data wrappers (fdw), wrappers over data sources that are not Postgres itself, will be mentioned. Now this functionality has developed so much that even transactions are supported.

    The Achilles heel of Postgres is, oddly enough, connecting new types of indexes. This operation is quite rare, but the flexibility inherent in it is not yet fully implemented. In general, Postgres is already rich in indexes: it has BTree, Rtree, reverse index (GIN), generalized index (GIST), spatial generalized index (SP-GIST). However, sometimes tasks arise that do not fit into a traditional database, for example, Bloom filters. At first glance, it’s enough to create a new type of index and insert it into the system catalog into several labels.

    The problem is that such pluggable indexes do not have access to the write ahead log (WAL), which underlies recovery from failures and replication. As a result, this index is not replicated at all, because replication goes through the WAL. In order for the new index to be written in WAL, it is necessary to patch the kernel of the system, extensions alone are not enough. But we expect that in version 9.6 will be implemented the ability to connect indexes using WAL.

    Fedor will also discuss exclusive constraints in his talk. Everyone knows about unique constraints that are used if a field must be unique in a table. But, for example, imagine a schedule in which for each lecture the audience, time and duration are indicated. You can rent an audience for three hours, or for 15 minutes. Lectures are not required to intersect in spacetime. Unique constraints, of course, are not suitable for this. Exclusive constraints will help here, so that you can ensure that these two time intervals in the same audience do not overlap.

    Future plans

    In a future article, Oleg Tsarev, another speaker at PGCONF.RUSSIA 2015 from Mail.Ru Group, will talk about some of the fundamental differences between MySQL and Postgres and give some tips on migrating from one to another. This material will be useful to all those who are no longer satisfied with the capabilities and features of MySQL, as well as those who are taking their first steps in Postgres. If you have questions to Oleg or to me - ask them in the comments.

    Also popular now: