Recursive (Hierarchical) Queries in PostgreSQL
Following Oracle with its 'connet by prior', all other DBMSs enter their hierarchical query (IZ) implementations. I would like to tell a wide audience how this is done in PostgreSQL.
It all started like this
then
Well And since version 8.4 Postgresql began to support a recursive query.
FM in PostgreSQL is implemented on the basis of standard SQL clause WITH. Let's go a little deeper: a non-recursive WITH allows you to reduce the cost of repeated subqueries, divide a complex query into several smaller ones, is a convenient shortcut to call the subquery, so to speak, and is very convenient in terms of saving time when writing code. as in the example below, we managed to avoid the use of a subquery in WHERE due to the use of the temporary table top_regions formed specifically for this request.
Adding an optional RECURSEVE statement allows a Postgre request to access its own output. The query algorithm must be composed of two parts. the first part is the base, usually returning one row with the starting point of the hierarchy or part of the hierarchy. That is, the place in the hierarchy from where the countdown will begin (for example, the root). and the second recursive part that will be associated with the time table that we declared after WITH. The first and second parts are combined with the UNION or UNION ALL operator. to put this chaotic set of words in order, let’s give an example.
Let's create a table in which the structure of one company will be described after entering the data there:
Now the recursive query itself:
The first part (lines 2-3) returns to the temporary table the first row in this case, the root record of our structure, from which the countdown in our hierarchy will begin. the second part (rows 4-5) adds to the same temporary table entries associated with the row already in temp1 via JOIN (ID = PARENT) and so on until all the leaves of our ROOTa end up in temp1.
Also in this example, the Orakolavska function sys_connect_by_path was simulated.
There is no built-in loop check in Postgre, so if we received data from the guys who were directly involved in creating the structure in Excel, we should check this structure for integrity. sometimes it’s enough to use UNION instead of UNION ALL but this is only sometimes. if in the first part you set a starting point in the hierarchy and even if somewhere in the hierarchy there are breaks, in principle, if you run the aforementioned query, there will be no error, just the “nippers” lines will be ignored. But we need to know where the error is, and we can implement this by implementing additional verification before executing UNION.
Here, as you see, the same Path field is created, but all the previous parents are organized in an array, which makes it possible for us to compare each new “ID” to a duplicate well, and if the array already has such an entry, then the row is entered in the temporary table with the flag and in the next pass, we no longer use this line to search for descendants, thanks to this we avoid looping (union all ... WHERE ... AND NOT CYCLE).
Advice from the official site: use the LIMIT operator as I did in the examples. it will help you save your nerves.
The theory is good, of course, but where all this can be put into practice, timbola in the light of the cost of such requests.
Besides that it would be nice to draw a hierarchy in one query, and find for example all the leaves of the current “ID” there are also other tasks.
Often you need to make changes, such as, for example, changing the phone code for all employees of a certain department in the telephone directory. Of course, you can use a column in the workers table or even make a prefix to “ID”. but all this makes the base not flexible and not scalable. It is much better to make a separate Workers table which will display a hierarchy with three columns “ID“, “PARENT“, “HIERARCHID”. the “HIERARCHID” field will allow you to make more than one hierarchical structure. For example, we will call the table ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. the “ANCESTOR" field will contain all the ancestors, remember the "Path" array from Example 3. so you can fill this table with just a recursive query.
get such a tablet
Now we will do all our selections and updates using this table. for example, with the same phone numbers, it will look something like this:
Yes, you will also need to provide a trigger to update the table when making a new or deleting an entry in the KPO table.
Continue,
Suppose that there is a table in which the logs are entered, imagine that you need to calculate how many entries for the previous month were made and grouped by day. for this you will need a reference calendar for the last month, for example so as not to miss the day when there were no entries. Here is a calendar (a list of days in one column) can be obtained by such a request.
Another not-so-good snack example. The original idea of Graeme Job (Graeme Job). the result is better to look in a text file.
Well, here's a post to start ;-)
History
It all started like this
From: Evgen Potemkin
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to propose the patch for adding the hierarchical queries
posibility. It allows to construct queries a la Oracle for ex:
SELECT a, b FROM t CONNECT BY a PRIOR b START WITH cond; B
I've seen this type of queries often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows result tree to be sorted, i.e. subnodes of each node will
be sorted by ORDER BY clause.
with regards, evgen
then
From: Tatsuo Ishii
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.
The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).
Well And since version 8.4 Postgresql began to support a recursive query.
Theory
FM in PostgreSQL is implemented on the basis of standard SQL clause WITH. Let's go a little deeper: a non-recursive WITH allows you to reduce the cost of repeated subqueries, divide a complex query into several smaller ones, is a convenient shortcut to call the subquery, so to speak, and is very convenient in terms of saving time when writing code. as in the example below, we managed to avoid the use of a subquery in WHERE due to the use of the temporary table top_regions formed specifically for this request.
- WITH regional_sales AS (
- SELECT region, SUM(amount) AS total_sales
- FROM orders
- GROUP BY region
- ), top_regions AS (
- SELECT region
- FROM regional_sales
- WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
- )
- SELECT region,
- product,
- SUM(quantity) AS product_units,
- SUM(amount) AS product_sales
- FROM orders
- WHERE region IN (SELECT region FROM top_regions)
- GROUP BY region, product;
* This source code was highlighted with Source Code Highlighter.
Adding an optional RECURSEVE statement allows a Postgre request to access its own output. The query algorithm must be composed of two parts. the first part is the base, usually returning one row with the starting point of the hierarchy or part of the hierarchy. That is, the place in the hierarchy from where the countdown will begin (for example, the root). and the second recursive part that will be associated with the time table that we declared after WITH. The first and second parts are combined with the UNION or UNION ALL operator. to put this chaotic set of words in order, let’s give an example.
Let's create a table in which the structure of one company will be described after entering the data there:
CREATE TABLE KPO
(
"ID" character varying(55),
"DESCRIPTION" character varying(255),
"PARENT" character varying(55
) ;
Select * from kpo
ID | DESCRIPTION | PARENT |
---|---|---|
== ====== | ================================== | ======= |
Kpo | KARACHAGANAK PETROLEUM OPERATING | {null} |
Aksay | Aksay | Kpo |
Uralsk | Kpo | Kpo |
London | London | Kpo |
Kpc | Kpc | Aksay |
U2 | UNIT-2 | Aksay |
U3 | UNIT-3 | Aksay |
PROD | PRODACTION | Kpc |
MAINT | MAINTENANCE | Aksay |
CMMS | CMMS TEAM | MAINT |
Now the recursive query itself:
- WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL ) AS (
- SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", CAST (T1."ID" AS VARCHAR (50)) as PATH, 1
- FROM KPO T1 WHERE T1."PARENT" IS NULL
- union
- select T2."ID", T2."PARENT", T2."DESCRIPTION", CAST ( temp1.PATH ||'->'|| T2."ID" AS VARCHAR(50)) ,LEVEL + 1
- FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") )
- select * from temp1 ORDER BY PATH LIMIT 100
* This source code was highlighted with Source Code Highlighter.
The first part (lines 2-3) returns to the temporary table the first row in this case, the root record of our structure, from which the countdown in our hierarchy will begin. the second part (rows 4-5) adds to the same temporary table entries associated with the row already in temp1 via JOIN (ID = PARENT) and so on until all the leaves of our ROOTa end up in temp1.
Also in this example, the Orakolavska function sys_connect_by_path was simulated.
"ID" | "PARENT" | DESCRIPTION | "Path" | "Level" |
---|---|---|---|---|
Kpo | KARACHAGANAK PETROLEUM OPERATING | Kpo | 1 | |
Aksay | Kpo | Aksay | KPO-> AKSAY | 2 |
Kpc | Aksay | Kpc | KPO-> AKSAY-> KPC | 3 |
PROD | Kpc | PRODAUCTION | KPO-> AKSAY-> KPC-> PROD | 4 |
MAINT | Aksay | MAINTENANCE | KPO-> AKSAY-> MAINT | 3 |
CMMS | MAINT | CMMS TEAM | KPO-> AKSAY-> MAINT-> CMMS | 4 |
U2 | Aksay | UNIT-2 | KPO-> AKSAY-> U2 | 3 |
U3 | Aksay | UNIT-3 | KPO-> AKSAY-> U3 | 3 |
London | Kpo | London | KPO-> LONDON | 2 |
Uralsk | Kpo | Uralsk | KPO-> URALSK | 2 |
There is no built-in loop check in Postgre, so if we received data from the guys who were directly involved in creating the structure in Excel, we should check this structure for integrity. sometimes it’s enough to use UNION instead of UNION ALL but this is only sometimes. if in the first part you set a starting point in the hierarchy and even if somewhere in the hierarchy there are breaks, in principle, if you run the aforementioned query, there will be no error, just the “nippers” lines will be ignored. But we need to know where the error is, and we can implement this by implementing additional verification before executing UNION.
- WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL, cycle ) AS (
- SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE
- FROM KPO T1
- union all
- select T2."ID", T2."PARENT", T2."DESCRIPTION", cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 ,
- T2."ID" = any (temp1.PATH)
- FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE )
-
- select * from temp1 WHERE CYCLE IS TRUE LIMIT 100;
* This source code was highlighted with Source Code Highlighter.
Here, as you see, the same Path field is created, but all the previous parents are organized in an array, which makes it possible for us to compare each new “ID” to a duplicate well, and if the array already has such an entry, then the row is entered in the temporary table with the flag and in the next pass, we no longer use this line to search for descendants, thanks to this we avoid looping (union all ... WHERE ... AND NOT CYCLE).
Advice from the official site: use the LIMIT operator as I did in the examples. it will help you save your nerves.
Practical examples
The theory is good, of course, but where all this can be put into practice, timbola in the light of the cost of such requests.
Besides that it would be nice to draw a hierarchy in one query, and find for example all the leaves of the current “ID” there are also other tasks.
Often you need to make changes, such as, for example, changing the phone code for all employees of a certain department in the telephone directory. Of course, you can use a column in the workers table or even make a prefix to “ID”. but all this makes the base not flexible and not scalable. It is much better to make a separate Workers table which will display a hierarchy with three columns “ID“, “PARENT“, “HIERARCHID”. the “HIERARCHID” field will allow you to make more than one hierarchical structure. For example, we will call the table ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. the “ANCESTOR" field will contain all the ancestors, remember the "Path" array from Example 3. so you can fill this table with just a recursive query.
- insert into ancestor ( “ID” "ANCESTOR", "HIERARCHID")
- WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL, cycle ) AS (
- SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE
- FROM KPO T1
- union all
- select T2."ID", T2."PARENT", T2."DESCRIPTION", cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 ,
- T2."ID" = any (temp1.PATH)
- FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE
- )
- select "ID" AS "LOCATION", PATH[1] AS "ANCESTOR" , 'DEPT' AS "DID" from temp1
* This source code was highlighted with Source Code Highlighter.
get such a tablet
LOCATION | ANCESTOR | HIERARCHID |
---|---|---|
========== | ========== | =========== |
Aksay | Aksay | DEPT |
Aksay | Kpo | DEPT |
CMMS | Kpo | DEPT |
CMMS | MAINT | DEPT |
CMMS | CMMS | DEPT |
CMMS | Aksay | DEPT |
Kpc | Aksay | DEPT |
Kpc | Kpc | DEPT |
Kpc | Kpo | DEPT |
Kpo | Kpo | DEPT |
London | London | DEPT |
London | Kpo | DEPT |
MAINT | Aksay | DEPT |
MAINT | MAINT | DEPT |
MAINT | Kpo | DEPT |
PROD | Kpo | DEPT |
PROD | Aksay | DEPT |
PROD | Kpc | DEPT |
PROD | PROD | DEPT |
U2 | Aksay | DEPT |
U2 | Kpo | DEPT |
U2 | U2 | DEPT |
U3 | U3 | DEPT |
U3 | Aksay | DEPT |
U3 | Kpo | DEPT |
Uralsk | Kpo | DEPT |
Uralsk | Uralsk | DEPT |
Now we will do all our selections and updates using this table. for example, with the same phone numbers, it will look something like this:
Update EMPLOYEE SET “TELNUM” = ‘545-454’ FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = ‘AKSAY’ AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND EMPLOYEE.ISDPRT = ‘N’ ;
Yes, you will also need to provide a trigger to update the table when making a new or deleting an entry in the KPO table.
Continue,
Suppose that there is a table in which the logs are entered, imagine that you need to calculate how many entries for the previous month were made and grouped by day. for this you will need a reference calendar for the last month, for example so as not to miss the day when there were no entries. Here is a calendar (a list of days in one column) can be obtained by such a request.
- WITH RECURSIVE t(n) AS (
- VALUES (1)
- UNION ALL
- SELECT n+1 FROM t WHERE n < (SELECT cast (extract ('day' from date_trunc('month',current_date) - interval '24 hour') as integer))
- )
- SELECT to_date ( n || '-'||extract ('month' from date_trunc('month',current_date) - interval '24 hour')
- || '-'||extract ('year' from date_trunc('month',current_date) - interval '24 hour'), 'dd-mm-yyyy')
- FROM t;
* This source code was highlighted with Source Code Highlighter.
Another not-so-good snack example. The original idea of Graeme Job (Graeme Job). the result is better to look in a text file.
- WITH RECURSIVE z(ix, iy, cx, cy, x, y, i) AS (
- SELECT ix, iy, x::float, y::float, x::float, y::float, 0
- FROM (select -1.88+0.016*i, i from generate_series(0,150) as i) as xgen(x,ix),
- (select -1.11+0.060*i, i from generate_series(0,36) as i) as ygen(y,iy)
- UNION ALL
- SELECT ix, iy, cx, cy, x*x - y*y + cx, y*x*2 + cy, i+1
- FROM z
- WHERE x * x + y * y < 16::float
- AND i < 27
- )
- SELECT array_to_string(array_agg(substring(' .,,,-----++++%%%%@@@@#### ',
- greatest(i,1), 1)),'')
- FROM (
- SELECT ix, iy, max(i) AS i
- FROM z
- GROUP BY iy, ix
- ORDER BY iy, ix
- ) AS zt
- GROUP BY iy
- ORDER BY iy
* This source code was highlighted with Source Code Highlighter.
Well, here's a post to start ;-)