erogov June 6, 2019 at 13:41

MVCC-7. Auto cleaning

Let me remind you that we started with issues related to isolation , made a digression about organizing data at a low level , talked in detail about row versions and how snapshots are obtained from versions .

Then we looked at in-page cleaning (and HOT updates), regular cleaning , but today we look at automatic cleaning.

Auto cleaning (autovacuum)

We have already said that ordinary cleaning under normal conditions (when no one holds the transaction horizon for a long time) must cope with its work. The question is how often to call it.

If you clean a changing table too rarely, it will grow larger than you would like. In addition, for the next cleaning, it may take several passes through the indices if too many changes have accumulated.

If you clear the table too often, then instead of useful work, the server will be constantly engaged in maintenance - also not good.

Note that starting a regular scheduled cleaning does not solve the problem, because the load can change over time. If the table began to be updated more actively, then it should be cleaned more often.

Automatic cleaning is just the very mechanism that allows you to start cleaning, depending on the activity of changes in the tables.

When auto-cleaning is enabled ( autovacuum configuration parameter ), the autovacuum launcher process is always present in the system, which plans to work, and autovacuum worker workflows are involved in real cleaning, several instances of which can work in parallel.

The autovacuum launcher process compiles a list of databases in which there is any activity. Activity is determined by statistics, and for it to be collected, the track_counts parameter must be set . Never turn off autovacuum and track_counts , otherwise auto - cleaning will not work.

Once in autovacuum_naptime, the autovacuum launcher process starts (using the postmaster process) a workflow for each database in the list. In other words, if there is any activity in the database, then workflows will come into it with an autovacuum_naptime interval . To do this, if there are several active databases (N pieces), then work processes are launched N times more often than autovacuum_naptime . But at the same time, the total number of simultaneously working workflows is limited by the autovacuum_max_workers parameter .

Having started, the workflow connects to the database specified by it and starts by building the list:

all tables, materialized views, and toast tables that need to be cleared,
all tables and materialized representations that require analysis (toast tables are not analyzed, because they are always accessed by index).

Further, the workflow in turn cleans and / or analyzes the selected objects and ends when the cleaning is completed.

If the process has not completed all the intended work for autovacuum_naptime , the autovacuum launcher process will send another workflow to the same database and they will work together. “Together” simply means that the second process will build its list of tables and follow it. Thus, different tables will be processed in parallel, but at the level of one table there is no parallelism - if one of the work processes is already working on the table, the other will skip it and move on.

There has been a discussion of the need for parallel processing for a long time, but the patch has not yet been adopted.

Now let’s take a closer look at what “requires cleaning” and “requires analysis”.

Which tables need cleaning

It is believed that cleaning is necessary if the number of "dead", that is, irrelevant, versions of strings exceeds a set threshold value. The number of dead versions is constantly collected by the statistics collector and stored in the pg_stat_all_tables table. And the threshold is set by two parameters:

autovacuum_vacuum_threshold defines the absolute value (in pieces),
autovacuum_vacuum_scale_factor determines the proportion of rows in a table.

The final formula is: cleaning is required if pg_stat_all_tables.n_dead_tup> = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * pg_class.reltupes.

The default settings are autovacuum_vacuum_threshold = 50 and
autovacuum_vacuum_scale_factor = 0.2. The main parameter here, of course, is autovacuum_vacuum_scale_factor - it is it that is important for large tables (namely, possible problems are associated with them). The value of 20% seems to be greatly overestimated, most likely it will need to be significantly reduced.

The optimal parameter values may vary for different tables depending on their size and nature of the changes. It makes sense to establish, on the whole, adequate values, and - if necessary - to specially configure the parameters at the level of some tables using the storage parameters:

autovacuum_vacuum_threshold and toast.autovacuum_vacuum_threshold ,
autovacuum_vacuum_scale_factor and toast.autovacuum_vacuum_scale_factor .

In order not to get confused, this should be done only for a small number of tables that stand out among others by the volume or intensity of changes, and only if the globally set values are not suitable.

In addition, auto-cleaning can be turned off at the table level (although it is difficult to think of a reason why this would be necessary):

autovacuum_enabled and toast.autovacuum_enabled.

For example, the last time we created a vac table with auto-cleaning disabled, to — for demonstration purposes — to manage manual cleaning. The storage parameter can be changed as follows:

=> ALTER TABLE vac SET (autovacuum_enabled = off);

To formalize all of the above, we will create a view showing which tables currently need to be cleaned. It will use a function that returns the current value of the parameter, given that it can be overridden at the table level:

=> CREATE FUNCTION get_value(param text, reloptions text[], relkind "char")
RETURNS float
AS $$
  SELECT coalesce(
    -- если параметр хранения задан, то берем его
    (SELECT option_value
     FROM   pg_options_to_table(reloptions)
     WHERE  option_name = CASE
              -- для toast-таблиц имя параметра отличается
              WHEN relkind = 't' THEN 'toast.' ELSE ''
            END || param
    ),
    -- иначе берем значение конфигурационного параметра
    current_setting(param)
  )::float;
$$ LANGUAGE sql;

And here is the view:

=> CREATE VIEW need_vacuum AS
  SELECT st.schemaname || '.' || st.relname tablename,
         st.n_dead_tup dead_tup,
         get_value('autovacuum_vacuum_threshold', c.reloptions, c.relkind) +
         get_value('autovacuum_vacuum_scale_factor', c.reloptions, c.relkind) * c.reltuples
         max_dead_tup,
         st.last_autovacuum
  FROM   pg_stat_all_tables st,
         pg_class c
  WHERE  c.oid = st.relid
  AND    c.relkind IN ('r','m','t');

What tables need analysis

With auto-analysis, the situation is approximately the same. It is believed that analysis requires tables that have the number of changed (since the last analysis) row versions exceed the threshold value set by two similar parameters: pg_stat_all_tables.n_mod_since_analyze> = autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * pg_class.reltupes.

The default auto-analysis settings are slightly different: autovacuum_analyze_threshold = 50 and autovacuum_analyze_scale_factor = 0.1. They can also be defined at the level of storage parameters for individual tables:

autovacuum_analyze_threshold
autovacuum_analyze_scale_factor

Since toast tables are not analyzed, there are no corresponding parameters for them.

Let's create a view for analysis:

=> CREATE VIEW need_analyze AS
  SELECT st.schemaname || '.' || st.relname tablename,
         st.n_mod_since_analyze mod_tup,
         get_value('autovacuum_analyze_threshold', c.reloptions, c.relkind) +
         get_value('autovacuum_analyze_scale_factor', c.reloptions, c.relkind) * c.reltuples
         max_mod_tup,
         st.last_autoanalyze
  FROM   pg_stat_all_tables st,
         pg_class c
  WHERE  c.oid = st.relid
  AND    c.relkind IN ('r','m');

Example

For experiments, we set the following parameter values:

=> ALTER SYSTEM SET autovacuum_naptime = ‘1s’; -- чтобы долго не ждать
=> ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.03;  -- 3%
=> ALTER SYSTEM SET autovacuum_vacuum_threshold = 0;
=> ALTER SYSTEM SET autovacuum_analyze_scale_factor = 0.02; -- 2%
=> ALTER SYSTEM SET autovacuum_analyze_threshold = 0;

=> SELECT pg_reload_conf();

 pg_reload_conf
----------------
 t
(1 row)

Now create a table similar to the one we used last time and insert a thousand rows into it. Auto-cleaning is disabled at the table level, and we will turn it on ourselves. If this is not done, then the examples will not be reproducible, since auto-cleaning may work at the wrong time.

=> CREATE TABLE autovac(
  id serial,
  s char(100)
) WITH (autovacuum_enabled = off);
=> INSERT INTO autovac SELECT g.id,'A' FROM generate_series(1,1000) g(id);

Here is what our cleanup view will show:

=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup | last_autovacuum 
----------------+----------+--------------+-----------------
 public.autovac |        0 |            0 | 
(1 row)

There are two points to which you should pay attention. Firstly, max_dead_tup = 0, although 3% of 1000 lines is 30 lines. The fact is that we do not yet have statistics on the table, since INSERT itself does not update it. Until our table is analyzed, the zeros will remain, since pg_class.reltuples = 0. However, let's take a look at the second view for analysis:

=> SELECT * FROM need_analyze WHERE tablename = 'public.autovac';

   tablename    | mod_tup | max_mod_tup | last_autoanalyze 
----------------+---------+-------------+------------------
 public.autovac |    1000 |           0 | 
(1 row)

Since the table has changed (added) 1000 rows, and this is more than zero, auto-analysis should work. Check this:

=> ALTER TABLE autovac SET (autovacuum_enabled = on);

After a short pause, we see that the table is analyzed and instead of zeros in max_mod_tup we see the correct 20 lines:

=> SELECT * FROM need_analyze WHERE tablename = 'public.autovac';

   tablename    | mod_tup | max_mod_tup |       last_autoanalyze        
----------------+---------+-------------+-------------------------------
 public.autovac |       0 |          20 | 2019-05-21 11:59:48.465987+03
(1 row)

=> SELECT reltuples, relpages FROM pg_class WHERE relname = 'autovac';

 reltuples | relpages 
-----------+----------
      1000 |       17
(1 row)

Let's get back to auto cleaning:

=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup | last_autovacuum
----------------+----------+--------------+-----------------
 public.autovac |        0 |           30 |
(1 row)

Max_dead_tup, as we see, has already been fixed. The second point to pay attention to is dead_tup = 0. Statistics show that there are no dead versions of rows in the table ... and this is true. There is nothing to clean in our table yet. So any table that is used only in the append-only mode will not be cleared and, therefore, the visibility map will not be updated for it. And this makes it impossible to use exclusively index-scanning (index-only scan).

(Next time we will see that cleaning sooner or later will come to the append-only table, but this will happen very rarely.)

Practical conclusion: if it is important to use only index scanning, you may need to call manual cleaning.

Now turn off auto-cleaning again and update the 31 line - one more than the threshold value.

=> ALTER TABLE autovac SET (autovacuum_enabled = off);
=> UPDATE autovac SET s = 'B' WHERE id <= 31;
=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup | last_autovacuum 
----------------+----------+--------------+-----------------
 public.autovac |       31 |           30 | 
(1 row)

Now the condition for triggering auto-cleaning is satisfied. Turn on auto-cleaning and after a short pause we will see that the table has been processed:

=> ALTER TABLE autovac SET (autovacuum_enabled = on);
=> SELECT * FROM need_vacuum WHERE tablename = 'public.autovac';

   tablename    | dead_tup | max_dead_tup |        last_autovacuum        
----------------+----------+--------------+-------------------------------
 public.autovac |        0 |           30 | 2019-05-21 11:59:52.554571+03
(1 row)

Load control

Cleaning does not block other processes, because it works on a page by page basis, but nevertheless creates a load on the system and can have a noticeable effect on performance.

Regulation for regular cleaning

In order to be able to control the intensity of cleaning and, consequently, its effect on the system, the process alternates between work and expectation. Cleaning performs approximately vacuum_cost_limit conventional units of work, and then falls asleep on vacuum_cost_delay ms.

The default settings set vacuum_cost_limit = 200, vacuum_cost_delay = 0. The last zero actually means that the (normal) cleaning does not fall asleep, so the specific value of vacuum_cost_limit does not play any role. This is done for the reason that if the administrator had to start VACUUM manually, then he probably wants to perform the cleanup as quickly as possible.

However, if you still set the sleep time, then indicated invacuum_cost_limit the amount of work will be the sum of the cost of working with pages in the buffer cache. Each page access is evaluated as follows:

if the page was found in the buffer cache, then vacuum_cost_page_hit = 1;
if not found, then vacuum_cost_page_miss = 10;
if you couldn’t find it, and you had to push the dirty page out of the buffer, then vacuum_cost_page_dirty = 20.

That is, with the vacuum_cost_limit settings by default, 200 pages from the cache, or 20 pages from the disk, or 10 pages with extrusion can be processed in one sitting. It is clear that these are rather arbitrary numbers, but it makes no sense to select them more precisely.

Regulation for auto cleaning

Load control during automatic cleaning works the same as for regular cleaning. But so that manual cleaning and auto- cleaning can work with different intensities, autoprocesses have their own parameters: autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay . If these parameters take the value -1, then the value from vacuum_cost_limit and / or vacuum_cost_delay is used .

By default, autovacuum_vacuum_cost_limit = -1 (that is, the value vacuum_cost_limit = 200 is used ) and autovacuum_vacuum_cost_delay = 20ms. On modern equipment with these numbers, auto-cleaning will work very, very slowly.

In version 12, the value of autovacuum_vacuum_cost_delay will be reduced to 2ms, which can be considered a more appropriate first approximation.

In addition, it should be noted that the limit set by these parameters is common to all work processes. In other words, as the number of concurrent work processes changes, the total load will remain constant. Therefore, if the task is to increase the performance of auto-cleaning , then when adding workflows, it is worth increasing autovacuum_vacuum_cost_limit .

Memory usage and monitoring

The last time we looked, like cleaning used memory size maintenance_work_mem storage identifiers row versions to be cleaned.

Auto cleaning does exactly the same. But there can be many concurrent processes if you set autovacuum_max_workers to a large value. In addition, all memory is allocated immediately and completely, and not by necessity. Therefore, for the auto-cleaning workflow, you can set your own restriction using the autovacuum_work_mem parameter . By default, this parameter is -1, i.e. it is not used.

As already mentioned, cleaning can work with a minimum amount of memory. But if indexes are created on the table, then a small valuemaintenance_work_mem may cause re-scans of indexes. The same is true for auto-cleaning. Ideally, you should select a minimum value of autovacuum_work_mem at which repeated scans do not occur.

We saw that for monitoring cleaning you can use the VERBOSE parameter (but it cannot be specified for auto-cleaning) or the pg_stat_progress_vacuum view (but it shows only the current information). Therefore, the main way to monitor auto-clean is the log_autovacuum_min_duration parameter , which displays information in the server’s message log. By default it is off (set to -1). There is a reason to enable this parameter (at a value of 0, information about all auto-cleaning starts will be displayed) and observe the numbers.

Here's what the output looks like:

=> ALTER SYSTEM SET log_autovacuum_min_duration = 0;
=> SELECT pg_reload_conf();

 pg_reload_conf 
----------------
 t
(1 row)

=> UPDATE autovac SET s = 'C' WHERE id <= 31;

student$ tail -n 7 /var/log/postgresql/postgresql-11-main.log

2019-05-21 11:59:55.675 MSK [9737] LOG:  automatic vacuum of table "test.public.autovac": index scans: 0
	pages: 0 removed, 18 remain, 0 skipped due to pins, 0 skipped frozen
	tuples: 31 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 4040
	buffer usage: 78 hits, 0 misses, 0 dirtied
	avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
	system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2019-05-21 11:59:55.676 MSK [9737] LOG:  automatic analyze of table "test.public.autovac" system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

All the necessary information is present here.

Recall that often you should not increase the size of the memory, but decrease the threshold for cleaning so that less data is processed at a time.

It may also make sense to monitor the length of the list of tables that need to be cleaned using the above views. An increase in the length of the list will indicate that the auto-cleaning does not have time to do its job and that settings need to be changed.

To be continued .

Tags: