Following the meetup “PostgreSQL 11 New Features” (Part 2)

    In the first part, we talked about the main innovations and changes in PostgreSQL 11. This time, we will discuss in more detail some points in the question / answer format that were raised by meetup.

    What is the best way to transfer a large data array as a set of input parameters for a stored procedure in PL / pgSQL?

    The most convenient way is to create a temporary table, make copies of the data there, and then use it in the procedure.

    External engines (zheap) and the development of in-memory PostgreSQL

    Not for all workloads, a model with storing old versions of records in the table itself is suitable. In all other subd (versionionniki) they are stored in an undo-log. You can argue about the feasibility, but the bottom line is that you need to store old records somewhere. If they have a short lifespan and someone rarely addresses them, then storing in the table itself is harmful. The external zheap engine PostgreSQL is an attempt by EnterpriseDB to create a table engine for PostgreSQL with undo log. It works, although there is still something to improve.

    Who works with Ms. SQL in SNAPSHOT Isolation Level mode, knows that it has tempdb, where it puts old versions, and is equipped with quite an adult vacuum for cleaning tempdb. On the other hand, the community asks to create in-memory tables in PostgreSQL. This can be done quite easily: tmpfs, and that’s it. In PostgreSQL Pro even released the first pilot version, you can try.

    What PostgreSQL never had was plug-in engines. There were pluggable indexes that used a common WAL. PostgreSQL has a lot to plug in and little to replace on the fly. For example, executor is not disabled, but you can already use custom nodes in it that you yourself program. The optimizers in PostgreSQL are completely pluggable. You can write your own and use PostgreSQL as an interpreter of your queries. SQL parser cannot be connected.

    Engines want to make connected in three directions:

    • engine with undo log
    • in-memory
    • column storage for OLAP queries

    Postgres Pro is in talks with EnterpriseDB on how to make an API to connect all of this.

    About foreign key

    Foreign key inside PostgreSQL is implemented by triggers. You can write your trigger that will implement any kind of functionality. All possible restrictions must be done in the trigger. The logic in the triggers is not particularly necessary to keep, but check everything - it is necessary.

    Is Postgres Pro planning to do SaaS or PaaS?

    Postgres Pro plans to make PostgreSQL more optimized for the cloud, in particular, to implement dynamic changes to shares buffers, to reduce the number of parameters that require a restart of PostgreSQL. They are not going to build the cloud themselves.

    How do I set up a drive so that parallel indexing works faster? Which is better, multiple HDDs or one SSD?

    Better a few SSDs. The more parallelization options the hardware provides, the better. If you have one disk, not enough memory and one processor, then parallelization will not help you. But SSDs have a peculiarity: they begin to slow down if more than 80% of the volume is occupied. Therefore, do not forget to adjust the trim, otherwise the limit of 80% will come somewhere around 50%.

    Dictionary management and adding words in full-text search

    If you use spell or snowball, then just change the stop-word dictionary. The trouble is that if you added a stop word, then there is no point in indexing. This can be done slowly. A stop word will be thrown out of a request and never be searched. And if you removed the stop word, then nowhere in the collection it is not there and you need to reindex it. The problem is not in the dictionary, but in the fact that you have already used it and saved your knowledge.

    Also, in many cases, you can use the little-known function ts_rewrite, which allows you to replace a piece of the request with another request. For example, when the Kursk submarine drowned, everyone rushed to look for information about it. Fedor Sigaev at that time worked in rambler, and at the request of "Kursk" information about the city was given out. They promptly made a substitution: on this word, give out information about the submarine. But then users began to curse, who were interested in the village itself. I don’t know if they realized or not, but it was necessary to introduce the “city of Kursk”. Such substitutions allow ts_rewrite to be made. In addition, the function can be used for a smooth transition during the period of dictionary changes.

    Of course, changing the parser and dictionaries are complex tasks. Languages ​​with different alphabets, like Russian and English, get along well. Much worse now are, say, French-English texts. It is not clear what language a word refers to, which is written the same way, but in one language it is a stop word, and in another it is not. Postgres Pro is currently working on fine-tuning dictionaries to describe more complex configurations.

    Covering indices and hot update

    It’s completely friends. True, if at least one field is updated in the covering index, then the index will behave as usual, everything will be replaced.

    Inability to create temporary tables when executing queries on standby

    PostgreSQL does not store table knowledge in the system directory, but there is a patch that transfers knowledge to the system directory. Therefore, with this patch you can use temporary tables. But then another problem arises: there are no transactions on stand by. To work with a temporary table, you will have to use twice the virtual transaction id, which applies only to temporary tables, and not to the main ones that come from the wizard. And when you look at a 32-bit number, they will be two different numbers.

    Postgres Pro also has a pg_variables module, which also works on stand by. This is not a temporary table, but the necessary functionality can be depicted.

    Implement Cluster Index

    Postgres Pro had several attempts to implement it. Now you can enter cluster table index and the table will be in the same order. Suffered with how to maintain a table in a clustered state. We tried different approaches, but invariably inserting into such a table was very expensive. And this is not interesting to anyone. Therefore, so far it has been concluded that it is necessary to move to Index Organized Tables.

    Recommended autovacuum scale factor

    Usually recommend setting 1 - 5%. But this is completely optional. For small tables, in which, despite the changes, on average, the same distribution remains, a large value can be set. If the table is large and rarely replenished, but aptly, with a strong change in distribution, you will have to invent something else. It all depends on the distribution of your data.

    Hints in complex queries

    In Oracle, with complex queries, you have to periodically help with hints, because sudden full scans occur. There are hints in Postgres Pro, quite moody, but you can get them. However, there are no hints in regular PostgreSQL, and they are unlikely to appear. If you have built-in hints, then users, faced with an optimizer problem, insert hints, calm down and do not report a problem. Optimizer development stops.

    By the way, the PostgreSQL optimizer has a problem. When he estimates a sample from a table, even for a more or less reasonable amount, he guesses with some error. Then it starts to connect, the result is connected to something else, the error accumulates, and at the third or fourth level PostgreSQL misses a lot.

    There is such a setting - join collapse limit. PostgreSQL sorts JOINs for more efficient use, but the default sort limit is 8. If there are more than 8 JOINs in a row, the system will not sort them and there will be a dependence on the JOIN order in the query.

    There is also a genetic optimizer with various parameters. You can enable various settings in a session and more or less describe how the request should be executed. Using this order, with the help of brackets you can set the shutdown of some operations, the same sec scan. Another option is to insert certain parameters into functions. In a sense, these are also hints. Not very convenient, but at least something.

    Also popular now: