“Of course, I consider the multimaster to be the most serious opportunity,” - Ivan Frolov on the development of Postgres Pro EE

    Dear colleagues, we are pleased to bring to your attention the second issue of our new rubric “Interviews with Database Developers”. We talked with Ivan Frolkov , a developer at Postgres Professional. Ivan has been engaged in application development for databases for over 20 years. Today, Ivan will cover the veil of secrecy and tell about new interesting features of the “domestic Posgres”, Postgres Pro: EE.



    PG Day: Tell me a little, please, how long have you been doing databases and in general are you in the profession, what are your roles, and so on.

    IF : I started doing databases, it was 93 years old and we did such a terrible thing as the shareholder register. Then, if anyone remembers, we had voucher privatization. We wrote it on a clipper. Then, in general, there were no special options: Fox Pro, Clipper, and, in my opinion, that's all.

    For some legal reasons, our company was from several branches, which duplicated each other. I had to exchange information with them. They also had this shareholder register written on a clipper. There were a lot of private individuals, voucher privatization, everything. What shocked me especially, the first few screens consisted simply of binary garbage, as if data files were broken there, there was all rubbish. After the rubbish came just empty fields, and then names, surnames and patronymics already began. Then I asked myself: how much can we trust what seems normal to us, having just such garbage at the very beginning? This is the first I had a sound judgment - it was the year 94th. Then I went to work in a company that served the Ministry of Finance of the Russian Federation. I was responsible for maintaining the register of auditors of the Russian Federation, then there were registers, certificates, licenses for audit operations. We had SQL Server 6.0, 6.5, 7.0 and Delphi.

    Further I a little freaked out. Then I ended up at Mail.Ru. I made postcards, mailings, answers there. After Mail.Ru he was engaged in startups. Everything was connected with databases in one form or another. In the end, I was also in a kind of startup, but professional. I am engaged in PostgresSQL databases, mainly query optimization and writing a variety of documents, which, frankly, I was very tired.

    PG Day: you say you're doing query optimization. What do you mean, all the same? What are you responsible for in Postgres Pro? As far as I know, certain patches, versions are ideologically invented by you .

    IF: Ideologically, I walked for a long time and ached that Postgres has a very nasty partitioning. Finally, they did the partitioning, and I, of course, drove it from earlier versions. Now it shows itself quite well. Only two very critical things remain: it has no global indexes and no separate partitioned indexes. And it does not work very well with the number of sections over 20-30 thousand. The bases are now large, and I would like to have hundreds of thousands, up to a million sections for extra-large volumes. The guys try, I walk and whine, they write. I hope I will somehow influence it.

    The second thing that I influenced is the alienated tables. Perhaps I invented them, and realized Nastya Lubennikova, thank you very much for that. The idea is that on one server you can prepare a table with data, “vacuum” it, build indexes, statistics, disconnect and connect to another server already ready.

    I am directly involved in integration with applications. The only thing I can say for .net I don’t write, fortunately or unfortunately. Java , PHP come across to me . And I still get to the bottom: “Here we have a request, it would be necessary to somehow disperse it.” I am dispersing it. They also gave me the helpers. Intelligent guys, I like it.

    PG Day: at the upcoming PG Day, you are going to read a very detailed master class about the Enterprise version of Postgres Pro that you are developing. How is it that you are reading this master class, which is devoted to the functionality of this version? And my second question: is it really so full of new opportunities that it deserves a many-hour master class?

    IF : I always considered myself an application programmer. Not the one who writes the DBMS, but the one who uses it. I think that the master class will be focused primarily on people who directly write code. Not the code that is in the DBMS, but which when Marya Ivanna arrives and says, "I clicked something, and everything disappeared."

    How interesting is the version to the developer? Yes, there are some very interesting opportunities. I have already mentioned alienated tables. This task arose from the requirements of one client, whose data periodically fell at such a speed that he simply did not have time to insert them into the database. This implies the requirement of a large number of sections in the table, for very large amounts of data.

    Also, also from my submission, we made a very small and very nice pg_variables module. These are session variables. They live during the session and locally for the session. In general, on the one hand, there is nothing supernatural. On the other hand, for some reason, this has not yet been done. This, in particular, is also critical for users who are migrating from Oracle, because there are batch variables there. In Postgres, this is not so good.

    The third thing: we are actively developing a multimaster . This is truly a true multimaster. Generally speaking, I considered quite a lot of all kinds of money, not specifically the bank “operden” notorious, but billings and other things. The owner or management really wants to know exactly what and how is happening with the money. Therefore, I deal with issues of transactional integrity and consistent presentation of data. In our master it is really provided. He is really a master-master. Everything is working fine. There is no quackery there. To resolve conflicts, you will have to roll back a transaction that failed to complete.

    Now he was pulled up great, improving right before his eyes. This was shown by the story of one of our clients, he has a rather intricate logic in hibernate. He cannot say about certain transactions whether they are reading or writing. His task worked fine, and worked on the multimaster 2.5 times faster than on the bare vanilla Postgres.

    PG Day: Yes, that sounds very impressive. You mentioned the problem of resolving conflicts between nodes, that a transaction is rolled back, and so on. It always seemed to me that the operation of the multimaster implies that there will be problems and they need to be somehow solved: to intervene manually in order to restore normal functioning .

    IF : In this case, when performing multimaster operations, synchronous replication actually occursand then an honest, perfectly normal two-phase commit . Yes, it is heavier than the usual commit, no one argues. But, on the other hand, in the case of a large number of readings, it gives a gain, because there you get at least three “nodes” in the multimaster and, accordingly, the read performance increases three times . Yes, I must note that for such cases, the JDBC driver is very relevant for listing various servers in the connection string. That is, in case of failure, for example, it can automatically reconnect.

    It may lag behind vanilla in pure performance, but synchronization with other servers is still required. Naturally, this imposes some limitations. But, on the other hand, we get it with a really working, catastrophic solution.

    Another feature is table compressionthat are primarily oriented towards reading. True, another kind of vacuum appears, called the garbage collector. For example, most of the huge tables that I have come across are, in fact, append-only tables. Government agencies, for example. They have an event, they register them. There are many events, but the event happened, and it is unchanging. And in this case, data compression basically gives a noticeable gain on input-output. Here, even this is not about the size on the disk, because the disks are now large, but on the other hand, we greatly benefit from reading and I / O.

    PG Day: Do you have any feature of the enterprise version, which is your personal favorite?

    IF: Oddly enough, I like the most, from the point of view that you need to sit down and write code for something, a simple pg_variables. It is without complaints, but comfortable. The most serious opportunity, of course, I consider the multimaster.

    PG Day: As far as I remember, in terms of the master class that you announced, there was a point about nested transactions and autonomous transactions. Briefly tell us what you have planned on this issue?

    IF : The first opportunity using autonomous transactions that is usually required is logging. The user performs such an operation, suddenly a rollback occurs, all data is lost. Maybe he wants to do something bad and it would be nice to take this into account. With autonomous transactions, we can register this operation inside individual transactions, completely independent of the main operation. There is still such a delicate moment when nothing can be done without this autonomous transaction or without an external application. For example, I came across this when I needed to send bitcoins.

    What is the situation in bitcoins? There is a global general ledger, accounting, the blockchain is called, transactions are written into it: from such an account to such and such and such an amount. Absolutely standard. Everything is somewhat trickier, but for simplicity's sake, suppose this is so. The problem is that the transaction becomes true when other participants in the exchange confirm it. That is, when sending a transaction, we do not know whether it passed or not. We sent the money, and it is possible that they will need to be re-sent. This may run into once a month, suppose. We carry out the operation of sending money, and the transaction then took and rolled back. Yes, we are honest people, we won’t do double-spending, but it’s kind of strange to lay the code on an honest word, so we can’t either roll it back or check whether it passed or not.

    We don’t get a two-phase commit for obvious reasons, and if we could verify it, then we could implement the idempotency property that we sent already, and we won’t. That is, in the end, we do not know the result of the execution. We can send only once and hope that everything will be fine. In the framework of the DBMS, we cannot do this in any way. When the transaction is rolled back, from the point of view of the application programmer, it seems like there was nothing, therefore, on the one hand, we must fix the fact of an attempted payment, and on the other hand, everything is lost during the rollback.

    PG Day: No one can figure it out without outside help.

    IF: If we do this from the application program, we can open a second connection and attempt to send register a separate transaction. And first we register what we are trying to send, and then send. Unfortunately, it is impossible to do this inside a DBMS without autonomous transactions. With autonomous transactions, this is entirely feasible. What else? This is the second task that arises - to avoid the repeated execution of canceled operations.

    I will also talk about an interesting feature of scheduler- scheduled tasks. It is available from pl / pgsql, you can call queries this way, plus it is still interesting for tasks. You can send the task for execution. What is interesting in our sheduler? Our tasks may depend on one another. That is, such and such a task will not be completed before others are completed.

    PG Day: that is, some kind of business logic can already be programmed this way.

    IF : Yes, if you remember, I had ordered messages in mbus .
    This is just the implementation of the same logic. That is, we can carry out a certain operation in several steps. If it fails, we must again roll it back in a few steps. They are doing a user interface for him now, let's see what happens. So far, it is available only in the API.

    PG Day: it sounds very impressive, a lot of interesting things. At us, some participants expressed skepticism that this would not be just a marketing report, where you will tell that all these features are very cool. Are you planning any practice, examples, so that participants can make sure that it is really cool? Can we expect a demonstration with illustrative examples?

    IF : I plan. I do not plan to distribute the version of Postgres Pro EE, but I will show everything on my computer. On the topic of what cool features, I will naturally talk. Why, then, to tell at all if it were bad?

    The master class will be interesting especially for people and organizations who are really thinking about using our product. I hope that this seminar will be useful precisely in order to dispel some fears, show new opportunities, and maybe even demonstrate unconscious problems ...

    If I wanted to engage in marketing, I would first of all talk about the fact that we have a FSTEC license, and for one customer we made a project with the level of secrecy “state secret”. Yes, not everyone needs it - even, perhaps, few people, but who needs it - those really need it. In addition, we can process personal data at the highest level (“Group 1 - special categories of personal data, which include information about the national and racial affiliation of the subject, religious, philosophical or political beliefs, information about the health and intimate life of the subject”). For medicine, this limitation may well be critical. If I were engaged in marketing, I would tell this. Because for a number of applications these are just killer requirements. But I will not.

    PG Day: What would you recommend to beginners to DBA, what technology to see, whose articles and blogs to read?

    IF :
    1. Carefully read the book Architecture of a Database System.
    2. At least carefully study the documentation for the database used.
    3. Learn the basics of accounting. Firstly, it’s just purely worldly useful, and secondly, it will become clear where the requirements for the DBMS came from ...
    4. Keep track of all the main DBMSs, since there are not many of them.
    5. I highly recommend that you familiarize yourself with them - primarily with Oracle and DB2, especially since this will be enough for a long time - these are very serious pieces.

    I personally read the Postgres mailing lists, but I don't think this is a must-read.

    PGDay: And in conclusion, in terms of use for work and in personal life, what operating systems, applications and equipment do you use?

    IF : There’s nothing unexpected - I have an old laptop with Windows 10 at home, otherwise my wife and daughter will not understand, and a little traveling with ubuntu is again a plus, at home, because of ubuntu, they will not go there, everything is mine. At work, regular Debian. From applications - the most important application for a programmer is Word (or Libre Office), as you know; and so - vi, psql, pgAdmin3 and Netbeans - I got used to it.

    Developing a new fork is a massive task. Unfortunately, we could not fit all the details about the domestic development into the framework of a 30-minute conversation. Friends, be sure to ask your questions, Ivan and his colleagues will be very happy. Well, we are in a hurry to invite everyone to PG Day'17 Russia . Ivan will hold a workshop Postgres Pro Enterprise for developers , in which he will tell in detail about the features of the fork that were already mentioned in the interview, and many others. You will also find many interesting presentations from Ivan’s colleagues: Oleg Bartunov, Alexander Korotkov, Ivan Panchenko and other Postgres Professional specialists.

    See you at PG Day'17 Russia !

    Also popular now: