puyol_dev2 July 8, 2019 at 09:24

Tuning PostgreSQL settings to optimize performance

Transfer

By default, PostgreSQL is not configured for the workload. Default values are set to ensure PostgreSQL is functional everywhere with the least amount of resources. There are default settings for all database settings. The primary responsibility of a database administrator or developer is to configure PostgreSQL to fit their system load. In this blog, we will outline the basic recommendations for tuning PostgreSQL database settings to improve database performance according to the workload.

Keep in mind that while optimizing your PostgreSQL server configuration improves performance, the database designer should also be careful when writing queries. If queries perform a full table scan where an index can be used, or perform heavy joins or expensive aggregation operations, then the system may still work poorly, even if the database settings are configured correctly. When writing database queries, it is important to pay attention to performance.

However, database parameters are also very important, so let's look at the eight that have the greatest potential for improving performance.

Custom PostgreSQL Options

PostgreSQL uses its own buffer, and also uses a buffered kernel IO. This means that the data is stored in memory twice, first in the PostgreSQL buffer, and then in the kernel buffer. Unlike other databases, PostgreSQL does not provide direct I / O. This is called double buffering. The PostgreSQL buffer is called shared_buffer , which is the most efficient custom parameter for most operating systems. This parameter sets how much allocated memory PostgreSQL will use for caching.

The default value for shared_buffer is set very low and you will not get much benefit from it. This is because some machines and operating systems do not support higher values. But in most modern machines you need to increase this value for optimal performance.

The recommended value is 25% of the total RAM of the computer. You should try some lower and higher values, because in some cases you can get good performance with a setting of more than 25%. But the actual configuration depends on your machine and the working dataset. If your working dataset can easily fit in your RAM, you can increase the value of shared_buffer so that it contains your entire database and the entire working dataset can be in the cache. However, you obviously do not want to reserve all the RAM for PostgreSQL.

It is noticed that in production environments, good performance really gives great importance to shared_buffer, although tests should always be performed to achieve the right balance.

^{Checking the value of shared_buffer}

testdb=# SHOW shared_buffers;
shared_buffers
----------------
128MB
(1 row)

_{Note : Be careful, as some kernels do not support a larger value , especially on Windows.}

wal_buffers

PostgreSQL first writes the entries in the WAL (prerecord log) to the buffers, and then these buffers are flushed to disk. The default buffer size defined by wal_buffers is 16 MB. But if you have many concurrent connections, then a higher value can improve performance.

effective_cache_size

effective_cache_size provides an estimate of the memory available for disk caching. This is just a guideline, not the exact amount of allocated memory or cache. It does not allocate actual memory, but tells the optimizer the amount of cache available in the kernel. If this parameter is set too low, the query planner may decide not to use some indexes, even if they are useful. Therefore, setting a big value always makes sense.

work_mem

This setting is used for complex sorting. If you need to do complex sorting, increase the value of work_mem to get good results. Sorting in memory is much faster than sorting data on disk. Setting it to a very high value can cause a memory bottleneck for your environment, as this option relates to the user sort operation. Therefore, if you have many users trying to perform sorting operations, then the system will highlight:

work_mem * total sort operations

for all users. Setting this parameter globally can result in very high memory usage. Therefore, it is highly recommended that you change it at the session level.

^{work_mem = 2MB}

testdb=# SET work_mem TO "2MB";
testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
Gather Merge  (cost=509181.84..1706542.14 rows=10000116 width=24)
   Workers Planned: 4
   ->  Sort  (cost=508181.79..514431.86 rows=2500029 width=24)
         Sort Key: b
         ->  Parallel Seq Scan on bar  (cost=0.00..88695.29 rows=2500029 width=24)
(5 rows)

The initial request sorting node is evaluated at 514431.86. Cost is an arbitrary calculated unit. For the above request, we have work_mem only 2 MB. For testing purposes, let's increase this value to 256 MB and see if it affects the cost.

^{work_mem = 256MB}

testdb=# SET work_mem TO "256MB";
testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
Gather Merge  (cost=355367.34..1552727.64 rows=10000116 width=24)
   Workers Planned: 4
   ->  Sort  (cost=354367.29..360617.36 rows=2500029 width=24)
         Sort Key: b
         ->  Parallel Seq Scan on bar  (cost=0.00..88695.29 rows=2500029 width=24)

Request cost reduced from 514431.86 to 360617.36, i.e. decreased by 30%.

maintenance_work_mem

maintenance_work_mem is a memory parameter used for maintenance tasks. The default value is 64 MB. Setting a large value helps in tasks such as VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY and ALTER TABLE.

^{maintenance_work_mem = 10MB}

postgres=# CHECKPOINT;
postgres=# SET maintenance_work_mem to '10MB';
postgres=# CREATE INDEX foo_idx ON foo (c);
CREATE INDEX
Time: 170091.371 ms (02:50.091)

^{maintenance_work_mem = 256MB}

postgres=# CHECKPOINT;
postgres=# set maintenance_work_mem to '256MB';
postgres=# CREATE INDEX foo_idx ON foo (c);
CREATE INDEX
Time: 111274.903 ms (01:51.275)

The index creation time is 170091.371 ms if the maintenance_work_mem parameter is set to only 10 MB, but it decreases to 111274.903 ms when we increase the maintenance_work_mem parameter to 256 MB.

synchronous_commit

Used to ensure that a transaction commit waits for a WAL to write to disk before returning a successful completion status to the client. This is a tradeoff between performance and reliability. If your application is designed in such a way that performance is more important than reliability, disable synchronous_commit . In this case, the transaction is committed very quickly because it will not wait for the WAL file to be reset, but reliability will be compromised. In the event of a server failure, data may be lost even if the client received a message indicating that the transaction commit completed successfully.

checkpoint_timeout, checkpoint_completion_target

PostgreSQL writes changes to the WAL. The checkpoint process flushes data to files. This action is performed when a breakpoint (CHECKPOINT) occurs. This is an expensive operation and can cause a huge number of IO operations. This whole process involves expensive read / write operations to disk. Users can always start the checkpoint task (CHECKPOINT) when necessary, or automate the start using the checkpoint_timeout and checkpoint_completion_target parameters .

The checkpoint_timeout parameter is used to set the time between WAL breakpoints. Setting it too low reduces the recovery time after a failure because more data is being written to the disk, but it also reduces performance because each checkpoint ultimately consumes valuable system resources.

checkpoint_completion_target is the fraction of the time between checkpoints to complete a checkpoint. High frequency checkpoints can affect performance. To smoothly complete the checkpoint job, checkpoint_timeout must be low. Otherwise, the OS will accumulate all dirty pages until the ratio is observed, and then produce a large reset.

Conclusion

There are more options that you can tweak to get better performance, but they have less impact than the ones highlighted here. In the end, we must always remember that not all parameters are relevant for all types of applications. Some applications work better when setting options, and some do not. PostgreSQL database settings must be tailored to the specific needs of the application and the operating system in which it runs.

Tags: