rdruzyagin April 6, 2016 at 12:03

Explaining the inexplicable. Part 4

Transfer

The PG Day'16 conference is getting closer every day, and we continue to publish a series of articles by Hubert Lubaczewski on explain analysis and its main operations.

In this, I hope, the penultimate post of the series, I will talk about the remaining most common operations that you can find in the explain output.

Unique

The name of the operation speaks for itself - it removes duplicate data.

This can happen, for example, when you do the following:

select distinct field from table

In more recent versions of Postgres, this request will be implemented using HashAggregate.

Unique's problem is that the data for it must be sorted. Not because this operation needs data in a specific order, but to ensure that all rows with the same values are “together”.

This makes Unique a really cool operation (in cases where it can be used), since it practically does not require memory. It simply compares the value in the previous line with the current and, if they are the same, discards it. That's all.

Thus, we can stimulate its use by pre-sorting the data:

$ explain select distinct relkind from (select relkind from pg_class order by relkind) as x;
                              QUERY PLAN
-----------------------------------------------------------------------
 Unique  (cost=22.88..27.26 rows=4 width=1)
   ->  Sort  (cost=22.88..23.61 rows=292 width=1)
         Sort Key: pg_class.relkind
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=1)
(4 rows)

Append

This plan simply launches a lot of sub-operations and returns all the rows returned by them as a general result.

This is used by UNION / UNION ALL queries:

$ explain select oid from pg_class union all select oid from pg_proc union all select oid from pg_database;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Append  (cost=0.00..104.43 rows=2943 width=4)
   ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=4)
   ->  Seq Scan on pg_proc  (cost=0.00..92.49 rows=2649 width=4)
   ->  Seq Scan on pg_database  (cost=0.00..1.02 rows=2 width=4)
(4 rows)

Here you see how append ran three scans across three tables and returned all rows together.

Please note that I used UNION ALL. If I used UNION, we would get the following:

$ explain select oid from pg_class union select oid from pg_proc union select oid from pg_database;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 HashAggregate  (cost=141.22..170.65 rows=2943 width=4)
   ->  Append  (cost=0.00..133.86 rows=2943 width=4)
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=4)
         ->  Seq Scan on pg_proc  (cost=0.00..92.49 rows=2649 width=4)
         ->  Seq Scan on pg_database  (cost=0.00..1.02 rows=2 width=4)
(5 rows)

This is because UNION removes duplicate rows, which in this case was done by the HashAggregate operation.

Result

Result appears mainly in very simple test queries. This operation is used when your query selects some constant value (or values):

$ explain select 1, 2;
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0)
(1 row)

In addition to test queries, it can be found in queries that do something like “insert, but only if this is not data duplication”:

$ explain insert into t (i) select 1 where not exists (select * from t where i = 1);
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on t  (cost=3.33..3.35 rows=1 width=4)
   ->  Result  (cost=3.33..3.34 rows=1 width=0)
         One-Time Filter: (NOT $0)
         InitPlan 1 (returns $0)
           ->  Seq Scan on t t_1  (cost=0.00..40.00 rows=12 width=0)
                 Filter: (i = 1)
(6 rows)

Values scan

Just like Result, Values Scan is used to return simple data entered in the query, but in this case it can be a whole set of records based on the VALUES () functionality.

If suddenly you are not up to date, you can select many rows and many columns without any table, just using the VALUES syntax, as in this example:

$ select * from ( values (1, 'hubert'), (2, 'depesz'), (3, 'lubaczewski') ) as t (a,b);
 a |      b      
---+-------------
 1 | hubert
 2 | depesz
 3 | lubaczewski
(3 rows)

The plan for such a request is as follows:

                          QUERY PLAN                          
--------------------------------------------------------------
 Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=36)
(1 row)

This operation is usually used in INSERTs, but it has other uses, for example, custom sorting .

GroupAggregate

This operation is similar to the HashAggregate that we talked about earlier .

The difference is that for GroupAggregate to work, the data must be sorted using the column or columns that you used in the GROUP BY clause.

Like Unique, GroupAggregate uses very little memory, but requires data streamlining.

Example:

$ explain select relkind, count(*) from (select relkind from pg_class order by relkind) x group by relkind;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 GroupAggregate  (cost=22.88..28.03 rows=4 width=1)
   ->  Sort  (cost=22.88..23.61 rows=292 width=1)
         Sort Key: pg_class.relkind
         ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=1)
(4 rows)

Hashsetop

This operation is used by the INTERSECT / EXCEPT operations (with the optional modifier "ALL").

It works as follows: it starts the Append suboperations for a pair of subqueries, and then, based on the result and the optional ALL modifier, decides which rows to return. I did not go deep into the source code, so I can’t say exactly how it works, but based on their name, the operation is similar to a simple solution based on a counter.

We see that, unlike UNION, these operations work with two data sources:

$ explain select * from (select oid from pg_Class order by oid) x intersect all select * from (select oid from pg_proc order by oid) y;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 HashSetOp Intersect All  (cost=0.15..170.72 rows=292 width=4)
   ->  Append  (cost=0.15..163.36 rows=2941 width=4)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.15..18.37 rows=292 width=4)
               ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.15..12.53 rows=292 width=4)
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.28..145.00 rows=2649 width=4)
               ->  Index Only Scan using pg_proc_oid_index on pg_proc  (cost=0.28..92.02 rows=2649 width=4)
(6 rows)

And with three sources, we get a more complex tree:

$ explain select * from (select oid from pg_Class order by oid) x intersect all select * from (select oid from pg_proc order by oid) y intersect all select * from (Select oid from pg_database order by oid) as w;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 HashSetOp Intersect All  (cost=1.03..172.53 rows=2 width=4)
   ->  Append  (cost=1.03..171.79 rows=294 width=4)
         ->  Subquery Scan on "*SELECT* 3"  (cost=1.03..1.07 rows=2 width=4)
               ->  Sort  (cost=1.03..1.03 rows=2 width=4)
                     Sort Key: pg_database.oid
                     ->  Seq Scan on pg_database  (cost=0.00..1.02 rows=2 width=4)
         ->  Result  (cost=0.15..170.72 rows=292 width=4)
               ->  HashSetOp Intersect All  (cost=0.15..170.72 rows=292 width=4)
                     ->  Append  (cost=0.15..163.36 rows=2941 width=4)
                           ->  Subquery Scan on "*SELECT* 1"  (cost=0.15..18.37 rows=292 width=4)
                                 ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.15..12.53 rows=292 width=4)
                           ->  Subquery Scan on "*SELECT* 2"  (cost=0.28..145.00 rows=2649 width=4)
                                 ->  Index Only Scan using pg_proc_oid_index on pg_proc  (cost=0.28..92.02 rows=2649 width=4)
(13 rows)

Cte scan

This operation is similar to the already mentioned Materialize operation. It launches part of the request and saves its output so that it can be used by the other part (or parts) of the request.

Example:

$ explain analyze with x as (select relname, relkind from pg_class) select relkind, count(*), (select count(*) from x) from x group by relkind;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=24.80..26.80 rows=200 width=1) (actual time=0.466..0.468 rows=6 loops=1)
   CTE x
     ->  Seq Scan on pg_class  (cost=0.00..10.92 rows=292 width=65) (actual time=0.009..0.127 rows=295 loops=1)
   InitPlan 2 (returns $1)
     ->  Aggregate  (cost=6.57..6.58 rows=1 width=0) (actual time=0.085..0.085 rows=1 loops=1)
           ->  CTE Scan on x x_1  (cost=0.00..5.84 rows=292 width=0) (actual time=0.000..0.055 rows=295 loops=1)
   ->  CTE Scan on x  (cost=0.00..5.84 rows=292 width=1) (actual time=0.012..0.277 rows=295 loops=1)
 Total runtime: 0.524 ms
(8 rows)

Please note that pg_class is scanned only once - line # 6. But its results are stored in “x” and then scanned twice - inside the aggregate (line # 9) and the HashAggregate operation (10).

What is the difference from Materialize? To give a detailed answer to this question, you need to dive into the source code, but I I would say that the difference is based on the simple fact that CTEs are user-defined, while Materialize is an auxiliary operation that Postgres decides to use when he sees fit.

Important: CTEs always start exactly as indicated. use in order to get around not the most successful optimizations that the scheduler can implement.

Initplan

This plan happens every time there is a part of the request that can (or should) be calculated before everything else and does not depend on anything in the rest of your request.

Let's say you want this query:

$ explain select * from pg_class where relkind = (select relkind from pg_class order by random() limit 1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=13.11..24.76 rows=73 width=203)
   Filter: (relkind = $0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=13.11..13.11 rows=1 width=1)
           ->  Sort  (cost=13.11..13.84 rows=292 width=1)
                 Sort Key: (random())
                 ->  Seq Scan on pg_class pg_class_1  (cost=0.00..11.65 rows=292 width=1)
(7 rows)

In this case, you need to run limit / sort / seq-scan before the usual sequential scan using pg_class, because Postgres will need to compare the relkind value with the value returned by the subquery.

On the other hand, I could write:

$ explain select *, (select length('depesz')) from pg_class;
                         QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.01..10.93 rows=292 width=203)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
(3 rows)

Postgres correctly sees that the subselect column does not depend on any data from the pg_class table, so you can run it once and do not need to recalculate the length for each row.

Of course, you can have many single plans (init plans), as here:

$ explain select *, (select length('depesz')) from pg_class where relkind = (select relkind from pg_class order by random() limit 1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Seq Scan on pg_class  (cost=13.12..24.77 rows=73 width=203)
   Filter: (relkind = $1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=13.11..13.11 rows=1 width=1)
           ->  Sort  (cost=13.11..13.84 rows=292 width=1)
                 Sort Key: (random())
                 ->  Seq Scan on pg_class pg_class_1  (cost=0.00..11.65 rows=292 width=1)
(9 rows)

But it’s worth considering one detail - init plans within a single request are numbered “globally”, and not by operations.

Subplan

SubPlan's are somewhat similar to NestedLoop. In the sense that they can also be called many times.

SubPlan is called to calculate data from a subquery that really depends on the current row.

For instance:

$ explain analyze select c.relname, c.relkind, (Select count(*) from pg_Class x where c.relkind = x.relkind) from pg_Class c;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class c  (cost=0.00..3468.93 rows=292 width=65) (actual time=0.135..26.717 rows=295 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=11.83..11.84 rows=1 width=0) (actual time=0.090..0.090 rows=1 loops=295)
           ->  Seq Scan on pg_class x  (cost=0.00..11.65 rows=73 width=0) (actual time=0.010..0.081 rows=93 loops=295)
                 Filter: (c.relkind = relkind)
                 Rows Removed by Filter: 202
 Total runtime: 26.783 ms
(7 rows)

For each line returned by scanning with “pg_class as c”, Postgres should launch SubPlan, which checks how many lines in pg_class have the same (as the line just processed) value in the relkind column.

Pay attention to “loops = 295” in the line “Seq Scan on pg_class x” and the corresponding value “rows = 295” in the node “Seq Scan on pg_class c”.

Others?

Yes, there are other operations. Some of them are too rare to be worth our attention (especially when you consider that you have a great source of knowledge - source codes), and some are (as I suspect) old versions of new nodes.

If you have a plan with an operation that I didn’t tell you about and you don’t understand, write me a link in the comments to explain to explain.depesz.com , the name of the operation, and the version of Postgres in which you met. I will try to find all possible information on such cases and give you a detailed answer.

Tags: