puyol_dev2 July 5, 2019 at 07:49

Tuning Linux kernel parameters to optimize PostgreSQL

Transfer

The optimal performance of PostgreSQL depends on correctly defined operating system parameters. Poorly tuned OS kernel parameters can degrade database server performance. Therefore, it is imperative that these parameters are configured in accordance with the database server and its workload. In this post, we will discuss some important Linux kernel parameters that can affect database server performance and how to tune them.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to determine the maximum size of a single shared memory segment that a Linux process can allocate. Prior to version 9.2, PostgreSQL used System V (SysV), which requires SHMMAX configuration. After 9.2, PostgreSQL switched to POSIX shared memory. So now less bytes of System V shared memory are required.

Prior to version 9.3, SHMMAX was the most important kernel parameter. The SHMMAX value is specified in bytes.

Similarly, SHMALL is another kernel parameter used to determine the
system-wide page size of shared memory. Use the ipcs command to view the current SHMMAX, SHMALL, or SHMMIN values .

^{SHM * Details - Linux}

$ ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1073741824
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

^{SHM * Details - MacOS X}

$ ipcs -M
IPC status from  as of Thu Aug 16 22:20:35 PKT 2018
shminfo:
	shmmax: 16777216	(max shared memory segment size)
	shmmin:       1	(min shared memory segment size)
	shmmni:      32	(max number of shared memory identifiers)
	shmseg:       8	(max shared memory segments per process)
	shmall:    1024	(max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate shared memory. This parameter is one of the most important kernel parameters. Whenever you receive the following error messages, it means that you have an older version of PostgreSQL and you have a very low SHMMAX value. It is expected that users will adjust and increase the value in accordance with the shared memory that they are going to use.

Possible misconfiguration errors

If SHMMAX is not configured correctly, you may receive an error when trying to initialize a PostgreSQL cluster using the initdb command .

^{initdb Failure} Similarly, you may get an error when starting the PostgreSQL server using the pg_ctl command . ^{pg_ctl failure}

DETAIL: Failed system call was shmget(key=1, size=2072576, 03600). 


HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.  

You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 2072576 bytes), 

reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections. 


If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,

in which case raising the request size or reconfiguring SHMMIN is called for. 


The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1

DETAIL: Failed system call was shmget(key=5432001, size=14385152, 03600).


HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.


You can either reduce the request size or reconfigure the kernel with larger SHMMAX.; To reduce the request size (currently 14385152 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.


If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, 

in which case raising the request size or reconfiguring SHMMIN is called for.


The PostgreSQL documentation contains more information about shared memory configuration.

Understanding Differences in Definitions

The definition of SHMMAX / SHMALL parameters is slightly different on Linux and MacOS X:

Linux: kernel.shmmax, kernel.shmall
MacOS X: kern.sysv.shmmax, kern.sysv.shmall

The sysctl command can be used to temporarily change a value. To set constant values, add an entry in /etc/sysctl.conf . Details are given below.

^{Change kernel settings on MacOS X}

# Get the value of SHMMAX
sudo sysctl kern.sysv.shmmax
kern.sysv.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kern.sysv.shmall 
kern.sysv.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kern.sysv.shmmax=16777216
kern.sysv.shmmax: 4096 -> 16777216
# Set the value of SHMALL 
sudo sysctl -w kern.sysv.shmall=16777216
kern.sysv.shmall: 4096 -> 16777216

^{Changing Linux kernel parameters}

# Get the value of SHMMAX
sudo sysctl kernel.shmmax
kernel.shmmax: 4096
# Get the value of SHMALL
sudo sysctl kernel.shmall
kernel.shmall: 4096
# Set the value of SHMMAX
sudo sysctl -w kernel.shmmax=16777216
kernel.shmmax: 4096 -> 16777216
# Set the value of SHMALL 
sudo sysctl -w kernel.shmall=16777216
kernel.shmall: 4096 -> 16777216

_{Remember : to make changes permanent, add these values to /etc/sysctl.conf}

Large Pages (Huge Pages)

Linux defaults to 4K pages, BSD uses Super Pages , and Windows uses Large Pages . A page is a piece of RAM allocated to a process. A process can have several pages, depending on memory requirements. The more memory a process needs, the more pages it has been allocated. The OS supports a page allocation table for processes. The smaller the page size, the larger the table, the longer it takes to find a page in this page table. Therefore, large pages allow you to use a large amount of memory with reduced overhead; fewer page views, fewer page errors, faster read / write operations through large buffers. The result is improved performance.

PostgreSQL only supports large pages on Linux. By default, Linux uses 4 KB of memory pages, so if you have too many memory operations, you must install larger pages. There is a performance gain when using large pages of 2 MB and up to 1 GB. Large page size can be set at boot time. You can easily check the parameters of the large page and their use on your Linux computer using the command cat / proc / meminfo | grep -i huge .

^{Getting information about large pages (Linux only)}

Note: This is only for Linux, for other OS this operation is ignored$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

In this example, although the large page size is set to 2048 (2 MB), the total number of large pages is 0. This means that large pages are disabled.

Script for determining the number of large pages

This simple script returns the required number of large pages. Run the script on your Linux server while PostgreSQL is running. Verify that the PostgreSQL data directory is set for the $ PGDATA environment variable .

^{Getting the number of required large pages}

#!/bin/bash
pid=`head -1 $PGDATA/postmaster.pid`
echo "Pid:            $pid"
peak=`grep ^VmPeak /proc/$pid/status | awk '{ print $2 }'`
echo "VmPeak:            $peak kB"
hps=`grep ^Hugepagesize /proc/meminfo | awk '{ print $2 }'`
echo "Hugepagesize:   $hps kB"
hp=$((peak/hps))
echo Set Huge Pages:     $hp

The script output is as follows:

^{Script output}

Pid:            12737
VmPeak:         180932 kB
Hugepagesize:   2048 kB
Set Huge Pages: 88

The recommended value for large pages is 88, so you should set it to 88.

^{Installing large pages}

sysctl -w vm.nr_hugepages=88

Check large pages now, you will see that large pages are not used (HugePages_Free = HugePages_Total).

^{Information about large pages again (Linux only)}

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       88
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now set huge_pages “on” to $ PGDATA / postgresql.conf and restart the server.

^{And again, information about large pages (Linux only)}

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       81
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that very few large pages are used. Let's now try to add some data to the database.

^{Some database operations for recycling large pages}

postgres=# CREATE TABLE foo(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO foo VALUES(generate_Series(1,10000000));
INSERT 0 10000000

Let's see if we use more large pages now than before.

^{Once again information on large pages (Linux only)}

$ cat /proc/meminfo | grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:      88
HugePages_Free:       18
HugePages_Rsvd:        1
HugePages_Surp:        0
Hugepagesize:       2048 kB

Now you can see that most of the large pages are in use.

_{Note: the approximate value for HugePages used here is very low, which is not a normal value for a machine in a food environment. Please evaluate the required number of pages for your system and set them accordingly depending on the load and resources.}

vm.swappiness

vm.swappiness is another kernel parameter that can affect database performance. This parameter is used to control the swappiness behavior (swapping pages to and from memory) on Linux. The value ranges from 0 to 100. It determines how much memory will be unloaded or unloaded. Zero means disabling the exchange, and 100 means aggressive exchange.

You can get good performance by setting lower values.

Setting the value to 0 in newer kernels may cause OOM Killer (Linux memory cleanup process) to kill the process. Thus, you can safely set the value to 1 if you want to minimize swapping. The default value on Linux is 60. A higher value causes the MMU (memory management unit) to use more paging space than RAM, while a lower value saves more data / code in memory.

A lower value is a good bet on improving performance in PostgreSQL.

vm.overcommit_memory / vm.overcommit_ratio

Applications receive memory and free it when it is no longer needed. But in some cases, the application gets too much memory and does not free it. This can cause an OOM killer. Here are the possible values for the vm.overcommit_memory parameter with a description for each:

Heuristic overcommit (default); core-based heuristic
Allow overcommit anyway
Do not overdo it, do not exceed the overcommit ratio.

Link: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

vm.overcommit_ratio - percentage of RAM available for overloading. A value of 50% in a system with 2 GB of RAM can allocate up to 3 GB of RAM.

A value of 2 for vm.overcommit_memory provides better performance for PostgreSQL. This value maximizes RAM usage by the server process without any significant risk of being killed by the OOM killer process. The application will be able to restart, but only within the overspending, which reduces the risk that the OOM killer will kill the process. Therefore, a value of 2 gives better performance than the default value of 0. However, reliability can be improved by ensuring that memory outside the acceptable range is not overloaded. This eliminates the risk that the process will be killed by the OOM-killer.

On non-paging systems, there may be a problem with vm.overcommit_memory equal to 2.

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

vm.dirty_background_ratio / vm.dirty_background_bytes

vm.dirty_background_ratio is the percentage of memory filled with dirty pages that need to be written to disk. Reset to disk in the background. The value of this parameter ranges from 0 to 100; however, a value below 5 may be inefficient, and some kernels do not support it. 10 is the default value on most Linux systems. You can improve performance for intensive recording operations at a lower rate, which will mean that Linux will dump dirty pages in the background.

You need to set the value of vm.dirty_background_bytes depending on the speed of your disk.

There are no “good” values for these two parameters, since both are hardware dependent. However, setting vm.dirty_background_ratio to 5 and vm.dirty_background_bytes at 25% of disk speed will increase performance to ~ 25% in most cases.

vm.dirty_ratio / dirty_bytes

This is the same as vm.dirty_background_ratio / dirty_background_bytes , except that the reset is performed in a working session, blocking the application. Therefore, vm.dirty_ratio should be higher than vm.dirty_background_ratio . This ensures that background processes will start earlier in order to avoid blocking the application as much as possible. You can adjust the difference between these two ratios depending on the disk I / O load.

Total

You can configure other parameters to increase productivity, but the improvements will be minimal and you will not get much benefit. We must remember that not all parameters apply to all types of applications. Some applications work better when we configure some settings, and some do not. You must find the right balance between the configurations of these parameters for the expected workload and the type of application, and when setting up, you must take into account the behavior of the OS. Configuring kernel parameters is not as easy as tuning database parameters: it is more difficult to give your recommendations here.

Tags: