PostgreSQL features not found in MySQL and vice versa
Many are afraid to switch from "muscle" to "stray" due to the fact that they only vaguely understand what it will give. Some people are stopped by the thought that Postgres is probably too complex and requires training. And also, that perhaps something will have to be lost in connection with the transition. I’ll try to clarify the situation a bit.
Generally speaking, if someone is afraid of complexity, then for starters you can do as everyone usually does: "dumb" switch from MySQL to PostgreSQL, without using new features. SQL - it is also in Africa SQL, it is not rocket science. With this transition, nothing complicated (with the so-called programming) will not be for you. Well, the quotation marks are different, the syntax is a little stricter. Those. use pg as mysql with other quotation marks to start, and then learn as the play progresses.
Now about what is in one, but not in the other and vice versa. To understand what kind of buns there are. Here is a list, of course, not complete and it is clearly not sorted by importance, but nonetheless.
Let's start with the shortcomings of the sin, which probably are not in the muscle.
- To work in production with progress, you need to set it up professionally. If you do not set correctly shared_buffers, auto-vacuum settings, etc., then everything will work slowly under serious loads. Particularly annoying is that for highload projects you almost always need a pgbouncer (this is a third-party development), which accumulates connections in the queue and ensures that there is a limited number of connections to the progress itself. It is strange that this is not built into postgres itself
- Auto vacuum. In simple terms, in order to achieve high write / delete speeds, the progression leaves heaps of garbage, which are then cleaned by specially trained demons. If it’s wrong to set up auto-vacuums or turn off the fool at all, especially on a very busy base, then the place occupied by the tables will swell, and sooner or later everything that can clog, or even without swelling, will clog, the base can just stand up and say that transaction id ended. At each conference, there are 3-4 reports on how someone heroically fought with auto-vacuum and won.
- Until recently, there was no INSERT IGNORE and INSERT ON DUPLICATE KEY UPDATE. Starting from 9.5, analogues appeared. It is very strange that they dragged on for so long with the implementation of such a functional that everyone needed.
- In Mysql, you can directly manipulate variables in a query
SELECT @x: = 0; SELECT @x: = @ x + 1 FROM table;
There is no such thing in progress, at least I did not find it (write if you made a mistake). Well, that is, you can of course make a storage where you can do anything at all, but to make it right in the request - it’s kind of like not. - There is no normal phpmyadmin counterpart. In fact, most of the cool co-workers I know work with SQL on the command line, which is hard to get used to at first. No, there are all sorts of pgMyAdmin, etc., but each of them has some kind of flaw. Maybe there are good paid ones, I did not check.
- Mysql everyone knows, postgresql no one knows. Therefore, new projects are often afraid to start on postgresql, because it will be necessary to support, and indeed the fear of the unknown. There is a whole class of php programmers for whom the word “database” and mysql are one and the same, synonyms. Those. it’s somehow difficult for them to get out of the shell or something.
- They say that cheap hosting does not really like postgresql, because it is more difficult to administer. For example, to create a user who can log in to postgres, you need to do this in two places: execute the sql query and write it in pg_hba.conf
Of the drawbacks compared to mysql for now. If you know something else specific, what is in mysql and what is not in postgresql - write in the comments. Now the goodies that postgresql has:
- CTE (Common Table Expression)
If to explain in a simple way, then subqueries can be written separately, giving them names, and all this within the framework of one query to the database. for instanceWITH subquery1 AS ( SELECT ... JOIN ... JOIN ... GROUP BY .... ), subquery2 AS ( SELECT ... WHERE .... ) SELECT * FROM subquery1 JOIN subquery 2 ON ...
An extremely useful thing for highly complex queries, where without named subqueries you can break your entire brain by conjuring with join-s and subquery brackets. There, of course, there are a lot of nuances in performance that you need to know, but still an incredibly useful thing. Which is not in MySQL. By the way, subqueries in CTE can be used recursively, for example, to get the whole subtree in a table of the form “id, parent_id”. - Work with ip addresses. For example, you need to quickly determine the city / country by ip-address.
Here I must say that in the progress there are custom data types and even operators that work with these types. Some can be done by oneself, some can be obtained by putting the extension on the line. For example, there is an ip4r extension that allows you to do something like this:- create a table with ip ranges create table ip_ranges ( ip_range ip4r ); insert into ip_ranges values ('2.2.3.4-2.2.3.10'), ('1.2.0.0/16');
Now we can get a list of ranges that intersect with the given ip using the && operator:test => select * from ip_ranges where ip_range && '1.2.1.1'; ip_range ------------ 1.2.0.0/16 (1 row)
There are other operators up to the heap: ranges between each other, etc. To make the search very fast, you can build a special GIST index:CREATE INDEX ip_ranges_idx ON ip_ranges USING GIST (ip_range);
And everything will just “fly” even on huge volumes of data. How to do this in mysql can not imagine, maybe there is some way? - Various CONSTRAINTS i.e. database constraints for integrity. MySQL also has UNIQUE, NOT NULL, FOREIGN KEY, etc. But what about this:
Modify the table from the previous example:ALTER TABLE ip_ranges ADD CONSTRAINT ip_ranges_exclude EXCLUDE USING GIST (ip_range WITH &&);
This entry ensures that only ip ranges in the table do not intersect with each other. When you try to insert a range, the ip of which is partially already in the table, there will be abuse:test => insert into ip_ranges values ('1.2.3.4/32'); ERROR: conflicting key value violates exclusion constraint "ip_ranges_exclude" DETAIL: Key (ip_range) = (1.2.3.4) conflicts with existing key (ip_range) = (1.2.0.0/16).
In the same way, you can use, for example, the circle data type and verify that disjoint circles are stored in the table. By the way, some geometric types and operations with them are built directly into the standard delivery: circle, box, polygon, etc.
Another useful constraint:create table goods ( id bigint, price decimal (11,2), ... check (price> = 0.01) )
And you will never accidentally insert a product with a zero price there. Of course, the conditions inside check can be any. - The killer feature of the latest versions of sgres is a jsonb type that allows you to search by jason very quickly. I will not dwell in detail, because in every second article about this, all ears buzzed.
- The so-called “window functions". For example, you need to give out for each employee his salary, and the average salary for the department in the same line, without using subqueries and group by.
SELECT depname, empno, salary avg (salary) OVER (PARTITION BY depname) FROM empsalary; depname | empno | salary | avg ----------- + ------- + -------- + --------------------- - develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.0000000000000000 develop | 8 | 6000 | 5020.0000000000000000 develop | 10 | 5200 | 5020.0000000000000000 personnel | 5 | 3500 | 3700.0000000000000000 personnel | 2 | 3900 | 3700.0000000000000000 sales | 3 | 4800 | 4866.666666666666666667 sales | 1 | 5000 | 4866.666666666666666667 sales | 4 | 4800 | 4866.666666666666666667 (10 rows)
Through window functions, a whole class of tasks can be simplified, for example, it is very useful for any analytics and billing. - Stored procedures can be written in different languages: pure sql, pl / pgsql (it is a language convenient for working with SQL-base, but slow), in javascript (pl / v8), in pearl and God knows what else. You can even attach your favorite language to the sire if you know si and are rather assiduous. This was discussed in detail on pgday. In my opinion, in postgresql everything is not so smooth with languages in stored files, but it is 100 times better than in mysql.
- You can do indexes not only by fields, but also by functions from them.
- Replication (Hot Standby) is done wisely. It works quickly and consistently.
- Speed. According to my subjective feelings, and I worked for many years with both databases, Postgresql as a whole is much faster than MySQL. Factor of. Both insert and read. If configured correctly, of course.
This is especially true when executing complex queries that mysql simply cannot handle, and temporary tables need to be fenced. - severity in everything. In mysql, it seems that only strict default mode was made in 5.7 (I didn’t check, is this really so?). Prior to this, it was possible to insert in a field of type decimal (5,2) a number greater than the position, and as a result, silently get 999.99. Silent trimming of lines, etc. Such jokes there are darkness. And this is the default behavior. Postgresql will lay its bones and curse, but will not silently execute an ambiguous request.
- Transactional. CREATE TABLE, ALTER TABLE, etc., like simple queries, can be executed in a single transaction or rolled back a transaction in the middle if something is wrong. Over time, you just don’t understand how I used to get out on mysql without it.
- Full-text search out of the box. There, in my opinion, the syntax is a bit unusual for a normal person, but everything works and you do not need to connect third-party gadgets like sphinx on the side.
- Sequences In mysql, there is only AUTO_INCREMENT per table field, which is ticking one at a time. In postgresql, this mechanism lives separately from the table, which can be used for a variety of needs, in addition, you can create them looped
- DBA seems to consider postgresql to be its main transactional machine. Transactions there are built in deeply and well, so everything works quickly and reliably, both for insertion and for reading. In mysql, the system is different, there is a base, and there are separate engines (such as: innodb, myisam, etc.), and the engines are not all transactional. There are some problems with this separation of transactions. For example, myisam is not transactional at all, innodb is transactional, and both tables can be used in the same query. How the base works at the same time, I don’t presume to predict, it’s probably difficult and crutches.
- Subjectively, there are fewer bugs in postgresql. I don’t know how they achieve this, but for me this is a fact - a very stable and reliable system, even under heavy loads and data volumes.
This is my first post on the Habr (sandbox), so please criticize strongly, but constructively.
What else are the specific advantages and disadvantages of these bases? Write in the comments.