Green light for developers - from startup to stars. Valentin Gogichashvili

    The PG Day is being held for the fourth time. During this time, we have accumulated a large database of useful materials from our speakers. The level of reports in the industry is getting higher and higher every year, but there are topics that, like good wine, do not lose their relevance.

    At a recent PG Day, Valentin Gogichashvili , head of Data Engineering at Zalando, described how PostgreSQL is used in a company with a large staff of developers, highly dynamic processes, and how they made this choice.

    It's no secret that Zalando is a regular guest of PG Day. At PG Day'17 Russia, we will present you three wonderful reports from German colleagues. Murat Kabilov and Alexey Klyukintalk about Zalando’s internal development for deploying highly available PostgreSQL clusters . Alexander Kukushkin talks about the practice of using PostgreSQL in AWS . Dmitry Dolgov will help to understand the insides and performance of the JSONB data type in the context of using PostgreSQL as a document-oriented storage.

    For the first time in my life, I will present in Russian. Do not blame me if some translations and terms are very funny for you. I'll start with myself. I'm Valentin, head of Data Engineering at Zalando.

    Zalando- This is a very famous online shoe store in Europe and everything related to clothing. That's what it looks like. Brand recognition of about 98% in Germany, our marketers did a very good job. Unfortunately, when marketers work well, technology departments become very bad. At the time when I came to Zalando 4 years ago, our department consisted of 50 people, and we grew 100% per month. And so it went on for a very long time. Now we are one of the largest online stores. We have three warehouse centers, millions of users and 8 thousand employees.

    Behind the screen, you can imagine that terrible things are happening, including such warehouses: we have three of these. And this is only one small room in which sorting is in progress. From a technological point of view, everything is much more beautiful, it is very good to draw diagrams. We have a wonderful person who can draw.

    Postgres occupies one of the most important niches in our structure, because Postgres records everything related to user data. In fact, everything except search data is written to Postgres. We are looking for in Solar'ah. Our technology office currently employs 700 people. We are growing very fast and are constantly looking for people. There is a big office in Berlin. Smaller offices in Dortmund, Dublin and Helsinki. Helsinki opened just last month, there is now hiring.

    What do we do as a technology company?

    We used to be a Java and Postgres company: everything was written in Java and written in Postgres. In March 2015, we announced the concept of radical agility , which gives our teams infinite autonomy with a choice of technology. Therefore, it is very important for us that Postgres still remains for our developers a technology that they will choose for themselves, and I will not come and say, “Well, let's write to Postgres”. Six terabytes is the base for "transactional" data. The largest database that is not included there is the event log, which we use to record our timeseries, business events (about 7 terabytes). It is interesting to work with this data. You learn a lot of new things about everything.

    What problems do we have?

    Continuous growth, fast, weekly development cycles: new features are rolled out every week. And downtime is not encouraged. Recently, we have had a problem - autonomous development teams are dragging databases into their Amazon AWS accounts, and DBA users do not have access to these databases.

    I’ll talk about how we change the data scheme so that the production does not stand idle. How we work with data (access to databases in Zalando is through a layer of stored procedures). I will explain very briefly why I consider this important. And how we shard, how we break databases, I will also tell.

    So, in Postgres, one of the most important abilities is the ability to change data schemes with virtually no locks.. The problems that exist in Oracle, in MySQL and in many other systems, and which essentially led to the fact that NoSQL databases have risen as full members of our friendly family of databases, are that other databases cannot so quickly and well change data schemas. PostgreSQL does not need locks in order to add a column or rename it, drop or add default values, or expand directories.

    All that is needed is a barrier lock, which must make sure that no one else is touching this table, and change the directory. Almost all operations do not require rewriting gigabytes of data. It's the most important. In particular, it is possible to create and delete CONCURRENTLY indexes (an index will be created without blocking your table).

    Problems Still Exist with Constraints. There is still no way to add constraint NOT NULL to a large gigantic table without having to verify that the table does not really have null values ​​in the columns. This is correct because we trust constraints. But, unfortunately, there are a couple of exceptions that must be applied in order for all this to work well, there were no scary locks that would stop the entire system.

    How did we organize this? When I started, at Zalando I was the only DBA. I wrote all the scripts that change the structure of the database. Then we realized that it was horrible, because you need to do this on every staging environment. I started dumping directories in different environments and comparing them with diffs. There was an idea to automate the creation of dbdiffs. I realized that it’s impossible to spend time on such a tool, and it’s easier to write scripts from one version to another with your hands. But the name dbdiff remains.

    With the growth that we had, it became impossible to write dbdiff ourselves. Therefore, we had to teach developers to write SQL, train them and certify PostgreSQL basics so that they understand how the database works at all, why locks arise, where regressions arise, etc. Therefore, we have introduced certification for "release". Only a person with such a certificate from our team receives administrative rights to the database and can stop the system. We, of course, come to the rescue and help, advise, do everything so that the guys have no problems.

    Here is an example of what a very simple dbdiff looks like: the order_address and foreign key tables are added. The problem is that if the table is changed during development, the source of this table must be changed every time. Since each object, each table is in a separate file in git, you need to go in and change dbdiff every time, you can use the excellent opportunity of pl / pgsql to upload files from a directory.

    Interestingly, the operation of adding foreign key constraint is problematic, because you need to lock the entire table, which can take a lot of time. In order to protect ourselves, we recommend that everyone set statement_timeout - the number of seconds during which it is permissible for your system to hold locks. If the table is stored in memory, 3 seconds is usually enough. If it does not fit, you can’t scan it in 3 seconds.

    What else helps us? The tools we wrote help us. pg_viewcollects all the information about the database that we need, including locks, partition occupancy with xlog. It looks approximately like top. This tool collects information about how much disk space is left so that you can stop migrations in time if something goes wrong, and shows the locks that appear in the database.

    nice_updater- this is a program that monitors the database, watches its load, load of xlog-partitions, slowly, 10-100 thousand records each, performs update statements, periodically starts the vacuum. Thus, we carry out large migrations. If you need to add some table or write new values ​​to additional columns, using nice_updater is very easy to do. They rolled out some operation, due to which several gigabytes of incorrect data were formed, they need to be rewritten - nice_updater helps us very well. In my opinion, it is already in open source. We try to “open source” all our tools: the quality of the code is growing very much, documentation is appearing.

    The biggest piece of advice I can give is to get programmers to write code that doesn't care if there is a database or not. Our biggest mistake was that Postgres worked too well, developers thought that the database is always like gravity. Therefore, any shutdown of the database for 2 seconds is regarded by our developers as horror and complete disaster. They do not write ROLLBACK, do not handle errors of this type, they have no tests.

    Having the ability to disable the database for 30 seconds in order to upgrade or do something else with the database is, in fact, the first thing that should be. Today, Andres and I [approx. Ed .: Andres Freund] said that in general you need to make a mode that will disconnect randomly connections so that developers learn to write normal code. We have a script that kills everything that takes more than five minutes. Statement timeout is set to 30 seconds by default. If someone writes a procedure that takes more than 30 seconds, he needs to give good reason.

    What do stored procedures give us?

    The biggest advantage is the ability to swap data on the go: add a new empty column by reading data from the old column. Then turn on the record in both columns and read from the new one, performing fallback to the old one using coalesce. Then migrate the data to a new column. And then you can already throw out the old one. A couple of times we did the normalization of tables so that the application code did not know about it at all. This feature is important for maintaining the system in working condition.

    With good training and with good tools, we managed in principle to avoid downtime caused by migrations or changes in the data structure. In order to understand the scale of the number of changes: we have about 100 dbdiffs per week being rolled out in our database. And they basically change tables. It is regularly said that relational databases lack the flexibility to change data structures. It is not true.

    We try to do dbdiff transactions, but unfortunately there are commands that are not transactional, like changing enum.

    How is data usually accessed?

    Here we have a classic structure - a hierarchy of objects: customer, he has a bank account. There are many different orders, items are listed in orders. What good is this hierarchy? Objects that are tied to a customer are associated only with him. In most cases, we do not need to go beyond this hierarchy. We are not at all interested, when placing an order with Customer A, what orders are available with Customer B, and vice versa. Everyone knows that there are a lot of advantages: you stay in your comfort zone, use the same language in which you write business logic.

    But we had big problems with hibernate, to teach developers to write code that will work well with transactions. Developers try to load the entire table into memory, then do something with it and commit some parts in a couple of minutes. There is a long transaction, and the longer it is, the more difficult it is to make migrations in the schemes. Tables must be constantly displayed in code. We do not have such a concept as a base separate from application. We call this data logic. In fact, these are constraints that are superimposed on the data, and it is convenient to keep them in stored procedures. This cannot be done through migrations. In essence, this is a separate data layer.

    If there are no stored procedures, it is better to have an abstract layer inside the application. Netflix, for example, does this too. They have a special library with which they completely abstract data access through the data layer. Thus, they migrated from Oracle to Cassandra: they split the logic into business and data, and then replace the business logic with another database. But changing the circuit in such a situation can be a nightmare.

    NoSQL- a wonderful thing, you can take this whole hierarchy together with all orders and create one document. Nothing needs to be initialized, everything is written directly to JSON. Transactions are not needed: what is written down is written down. Implicit schemes arise. How to work with it when the structure of the document begins to change and where to shove all this logic? This is scary. At the moment, unfortunately, there is no NoSQL database except PostgreSQL, which does not lose data in ACID.

    And, accordingly, NoSQL does not have SQL. SQL is a very powerful language for analytic operations; it transforms data very quickly. To do all this, for example, in Java, is also somehow scary.

    What are the alternatives to ORM?

    Direct SQL queries. You can pull aggregates from the database without using stored procedures. There are clear boundaries of the transaction - one SQL is launched, no time is wasted processing data between transactions. A very good example: YeSQL on Clojure works almost like a stored procedure. And Scala Slick - if you are engaged in Scala and have not yet seen Slick, then you should definitely look at the source code, this is one of the most impressive pieces of code that I have ever seen.

    Stored procedures. Clear transaction boundaries. Abstraction from the data layer. This is a recipe for a classic Java application.

    We have business logic, JDBC Driver, and data tables. What have we done? First, we implemented a stored procedure layer. Suppose we return row, for example, getFullCustomerInformation, in which there is an array, its orders are serialized by Posgres, inside there is also an array with positions, inside which there is also an array with bank data. It turns out a very well typed hierarchical structure. If all this is written in Java, at some point we have 50 members pages in classes. This led to such terrible consequences that we decided to write our own library. They called her Sproc Wrapper, which works as an APC Layer in a database. It essentially makes the application server database.

    What does it look like?

    A stored procedure is written, then such a small interface with annotations is written. The register_customer call is completely transparent to the application, the stored procedure in the database is called, as is the serialization / deserialization of all the terrible nested arrays, hash maps, etc. Including order address structures that map as list of orders.

    What problems?

    The problems with stored procedures are that you need to write too much code. If you need to do a lot of CRUD operations (you are writing a new Excel), I would not recommend using stored procedures. If you have tables of 100 columns, you have to enter the change in each column as a separate stored procedure, then you can fool around. We had people who wrote a bootstrapper that generated these stored procedures. But then we said that it is better to use hibernate in this situation and edit these tables. For example, we have a tool in the procurement team that drives product information, it is written in hibernate. 500 people use these tools, and 15 million use our main site.

    What is positive? Need to learn SQL. This helps developers a lot. Developers who have begun to learn Closure and Scala now periodically resort to me and say "Scala is almost like SQL, wow !!!". I guess, yes. Pipeline data that flows from bottom to top through functional filters is exactly what SQL has always done. Unfortunately, there is still no execution planner in Scala.

    Automate everything.

    Everything that is done by hands is most likely done poorly. It is very important to know how PostgreSQL works, how systems work so that nothing breaks.

    How do we version?

    At first, when we just started using stored procedures, when we changed the procedure, the type of data that it returned changed. You need to drop the old procedure, roll out a new one. All this in one transaction. If there are any dependencies between stored procedures, I had to look for them with handles. Drop'at completely and roll out again. When I was the only DBA in the company, I just did what I wrote about these scary dbdiffs that update stored procedures. Then we somehow sat down and came up with the idea that you can use the awesome PostgreSQL search_path feature, which controls the paths for finding objects in a session. If your application with version 15.01 opens and sets search_path, then the objects that it will find will be located in this scheme.

    Our deployment tool during the rollout of the application with this set of stored procedures creates the schema of the desired version and loads everything there. Application then uses them. What happens when we roll out a new version? The Deployment tool rolls out a new scheme with the whole set of stored procedures that we have, and, while the new version is being rolled out, both versions actually work for us, each with its own set of stored procedures. There is nothing related to data. These are the so-called API schemes that provide a data access layer and nothing more. And all the migrations that happen, they happen here. Therefore, when migration occurs, it must be compatible with the previous version, which is still working.

    Question from the audience :How much time did you teach developers to work on this flow? How to ensure that everyone does just that and does not do migrations that are not compatible with previous versions? Do you somehow test additionally that the migration is really correct, that the new API reads old data correctly and does not crash at the same time?

    Valentine : This, of course, is a question of how well test coverage works, and how well everything is tested. Our developers work on local bases, then we have integration staging, test staging, release staging and production.

    Question from the audience : Who writes the API, DBA or developers? How are access rights split?

    Valentine: Developers write. The option when DBA does this is not scalable. I know several small companies in which DBAs write all the APIs in general. When they called me, they also thought that I would write an API. But this is unreal. First, we hired five people a month, now we hire 40 people a month. Therefore, it is easier to spend time for developers to learn how to work with the database. It is very simple in fact, if you explain how everything is physically stored and arranged.

    Testing is very convenient because the entire API layer is tested and no migration is required. Everything can be automated.

    What are the good things about having one big base?

    If I were asked if everything can be done in one large base, then I would answer: stay in one large base for as long as you can afford it. If you have all the data of your business placed in the database RAM, you don’t have to do anything, stay in one database, it’s very fine. You can do analytics quickly, bind data between objects, strategies for accessing various data are trivial. It is enough to support one machine, and not a whole bunch of different nodes.

    But the problem arises when you have more data than RAM. Everything is getting slower: migrations, backups, maintenance, upgrades. The larger the base, the greater the headache. We share the data: take one large logical base and put it on many PostgreSQL instances.

    What is so good about it?

    Again, our bases are getting small. You can quickly work with them, but the problem, of course, is that it is already impossible to do joins. Analysts need more tools. Working with data requires more tools. If you think that you can work with large amounts of data without investing in the development of an infrastructure that automates your processes, then you are mistaken. This is impossible to do. It is necessary to write a lot of tools.

    We had an advantage. We already had a Sproc Wrapper, which provides us with a data layer. We just taught him how to use different databases.

    What does it look like? We have a call to the findOrders function with the runOnAllShards parameter= true. It calls the stored procedure on all shards that it has registered. Or we have a CustomerNumber , and we say that it is a shard key. In the configuration, you can specify which search strategy (Lookup Strategy) can be used: parallel search by shards, shard aware ID, and hashing, in my opinion, is also supported. The most widely used strategy for finding objects on shards is the so-called Virtual Shard ID .

    The idea is really simple. We have a partitioning key - in the hierarchy that I showed, it will be CustomerNumber. Partitioning key is a key that defines for each object the boundaries of the links between your objects.

    Key formation.

    The main thing is to understand what a partitioning key is. For example, we have users. The user has his orders, and many orders are tied to these orders. Partitioning key - a shared key that allocates a group of objects that belong to the same user. We will have this customer number, a unique user number. It needs to be dragged along with all objects of the Order type, the underlying objects in the hierarchy, in order to understand where my Customer is located. I should always be able to find out where the parent lies for objects from the hierarchy. I am afraid to say Customer ID, because ID is a technical key. We are not talking about technical keys. We are talking about logical keys. Because technical keys will not be unique within the logical framework.

    It is perfectly normal to use a UUID for a Customer ID. We distinguish between Customer Number and Customer ID. One Customer ID exists eight times in our system, in eight databases. And Customer Number is always one. We hash with MD5, but you can do better. The main thing is that hashes are distributed evenly. This is done at the level of sharding strategy. In fact, a hash needs to be implemented wherever an application needs to quickly find the location of a hierarchy of objects. In our situation with Sproc Wrapper, this will be just a Sharding Strategy for the Customer object.

    By the size of the hash of this key, we determine the number of virtual shards. What happens when we want to split the base? We just say that we are sharing the base and starting to use the first bit in our hash. Thus, when the database is divided, we can indicate it was master, it became a slave, and at such a moment we will have a very short system shutdown. At the moment, this is so. It could be fully automated to be transparent. We turn off the system, change the sharding strategy and say that from now on we have access to here and here, but we write the data in which the first bit is one, to another database in which there was already data. The only thing we need to do after this is to erase all objects that belong to one from this base, and from the other, erase all data that refers to zero. Etc. It can even be divided asymmetrically. A sharding strategy may know that if your hash starts from scratch, then there are a couple of shards. This is best not to do, because you can go crazy. In principle, we shared this two times already.

    We are now experimenting with the awesome PostgreSQL feature - logical replication . This is multi-master - the ability to upgrade to major versions without having to stop the system, do everything slowly and painfully. Partial replication - it will be possible to pull only one table or part of a table from databases. Do cache updates.

    We are working very hard to pull PostgreSQL into AWS with the great features that RDS now provides. Our AWS team has developed a system called STUPS. It allows you to roll out docker images in Spilo, traceable and monitorable way. Spilo with the help of three commands can roll out a PostgreSQL cluster on AWS, which will be high available, switch itself when one of the nodes is turned off, select the wizard. But this is a topic for another discussion.

    Also popular now: