airmile October 29, 2009 at 18:12

Recursive (Hierarchical) Queries in PostgreSQL

Following Oracle with its 'connet by prior', all other DBMSs enter their hierarchical query (IZ) implementations. I would like to tell a wide audience how this is done in PostgreSQL.

History

It all started like this

From: Evgen Potemkin
Subject: Proposal of hierachical queries (a la Oracle)
Date: 2002-11-14 11:52:28 GMT
Hi there!
I want to propose the patch for adding the hierarchical queries
posibility. It allows to construct queries a la Oracle for ex:
SELECT a, b FROM t CONNECT BY a PRIOR b START WITH cond; B
I've seen this type of queries often made by adding a new type, which
stores position of row in the tree. But sorting such tree are very
tricky (i think).
Patch allows result tree to be sorted, i.e. subnodes of each node will
be sorted by ORDER BY clause.
with regards, evgen

then

From: Tatsuo Ishii
Subject: RFP: Recursive query in 8.4
Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
Hi,
As I promised before we would like to propose implementing the
recursive query as defined in the standard for PostgreSQL 8.4.
The work is supported by Sumitomo Electric Information Systems Co.,
Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
(http://www.sraoss.co.jp).

Well And since version 8.4 Postgresql began to support a recursive query.

Theory

FM in PostgreSQL is implemented on the basis of standard SQL clause WITH. Let's go a little deeper: a non-recursive WITH allows you to reduce the cost of repeated subqueries, divide a complex query into several smaller ones, is a convenient shortcut to call the subquery, so to speak, and is very convenient in terms of saving time when writing code. as in the example below, we managed to avoid the use of a subquery in WHERE due to the use of the temporary table top_regions formed specifically for this request.

WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product; * This source code was highlighted with Source Code Highlighter.

Adding an optional RECURSEVE statement allows a Postgre request to access its own output. The query algorithm must be composed of two parts. the first part is the base, usually returning one row with the starting point of the hierarchy or part of the hierarchy. That is, the place in the hierarchy from where the countdown will begin (for example, the root). and the second recursive part that will be associated with the time table that we declared after WITH. The first and second parts are combined with the UNION or UNION ALL operator. to put this chaotic set of words in order, let’s give an example.
Let's create a table in which the structure of one company will be described after entering the data there:

CREATE TABLE KPO

(

 "ID" character varying(55),

 "DESCRIPTION" character varying(255),

 "PARENT" character varying(55

) ;

Select * from kpo

ID	DESCRIPTION	PARENT
== ======	==================================	=======
Kpo	KARACHAGANAK PETROLEUM OPERATING	{null}
Aksay	Aksay	Kpo
Uralsk	Kpo	Kpo
London	London	Kpo
Kpc	Kpc	Aksay
U2	UNIT-2	Aksay
U3	UNIT-3	Aksay
PROD	PRODACTION	Kpc
MAINT	MAINTENANCE	Aksay
CMMS	CMMS TEAM	MAINT

Now the recursive query itself:

WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL ) AS ( SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", CAST (T1."ID" AS VARCHAR (50)) as PATH, 1 FROM KPO T1 WHERE T1."PARENT" IS NULL union select T2."ID", T2."PARENT", T2."DESCRIPTION", CAST ( temp1.PATH ||'->'|| T2."ID" AS VARCHAR(50)) ,LEVEL + 1 FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") ) select * from temp1 ORDER BY PATH LIMIT 100 * This source code was highlighted with Source Code Highlighter.

The first part (lines 2-3) returns to the temporary table the first row in this case, the root record of our structure, from which the countdown in our hierarchy will begin. the second part (rows 4-5) adds to the same temporary table entries associated with the row already in temp1 via JOIN (ID = PARENT) and so on until all the leaves of our ROOTa end up in temp1.
Also in this example, the Orakolavska function sys_connect_by_path was simulated.

"ID"	"PARENT"	DESCRIPTION	"Path"	"Level"
Kpo		KARACHAGANAK PETROLEUM OPERATING	Kpo	1
Aksay	Kpo	Aksay	KPO-> AKSAY	2
Kpc	Aksay	Kpc	KPO-> AKSAY-> KPC	3
PROD	Kpc	PRODAUCTION	KPO-> AKSAY-> KPC-> PROD	4
MAINT	Aksay	MAINTENANCE	KPO-> AKSAY-> MAINT	3
CMMS	MAINT	CMMS TEAM	KPO-> AKSAY-> MAINT-> CMMS	4
U2	Aksay	UNIT-2	KPO-> AKSAY-> U2	3
U3	Aksay	UNIT-3	KPO-> AKSAY-> U3	3
London	Kpo	London	KPO-> LONDON	2
Uralsk	Kpo	Uralsk	KPO-> URALSK	2

There is no built-in loop check in Postgre, so if we received data from the guys who were directly involved in creating the structure in Excel, we should check this structure for integrity. sometimes it’s enough to use UNION instead of UNION ALL but this is only sometimes. if in the first part you set a starting point in the hierarchy and even if somewhere in the hierarchy there are breaks, in principle, if you run the aforementioned query, there will be no error, just the “nippers” lines will be ignored. But we need to know where the error is, and we can implement this by implementing additional verification before executing UNION.

WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL, cycle ) AS ( SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE FROM KPO T1 union all select T2."ID", T2."PARENT", T2."DESCRIPTION", cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 , T2."ID" = any (temp1.PATH) FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE ) select * from temp1 WHERE CYCLE IS TRUE LIMIT 100; * This source code was highlighted with Source Code Highlighter.

Here, as you see, the same Path field is created, but all the previous parents are organized in an array, which makes it possible for us to compare each new “ID” to a duplicate well, and if the array already has such an entry, then the row is entered in the temporary table with the flag and in the next pass, we no longer use this line to search for descendants, thanks to this we avoid looping (union all ... WHERE ... AND NOT CYCLE).

Advice from the official site: use the LIMIT operator as I did in the examples. it will help you save your nerves.

Practical examples

The theory is good, of course, but where all this can be put into practice, timbola in the light of the cost of such requests.
Besides that it would be nice to draw a hierarchy in one query, and find for example all the leaves of the current “ID” there are also other tasks.
Often you need to make changes, such as, for example, changing the phone code for all employees of a certain department in the telephone directory. Of course, you can use a column in the workers table or even make a prefix to “ID”. but all this makes the base not flexible and not scalable. It is much better to make a separate Workers table which will display a hierarchy with three columns “ID“, “PARENT“, “HIERARCHID”. the “HIERARCHID” field will allow you to make more than one hierarchical structure. For example, we will call the table ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. the “ANCESTOR" field will contain all the ancestors, remember the "Path" array from Example 3. so you can fill this table with just a recursive query.

insert into ancestor ( “ID” "ANCESTOR", "HIERARCHID") WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL, cycle ) AS ( SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE FROM KPO T1 union all select T2."ID", T2."PARENT", T2."DESCRIPTION", cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 , T2."ID" = any (temp1.PATH) FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE ) select "ID" AS "LOCATION", PATH[1] AS "ANCESTOR" , 'DEPT' AS "DID" from temp1 * This source code was highlighted with Source Code Highlighter.

get such a tablet

LOCATION	ANCESTOR	HIERARCHID
==========	==========	===========
Aksay	Aksay	DEPT
Aksay	Kpo	DEPT
CMMS	Kpo	DEPT
CMMS	MAINT	DEPT
CMMS	CMMS	DEPT
CMMS	Aksay	DEPT
Kpc	Aksay	DEPT
Kpc	Kpc	DEPT
Kpc	Kpo	DEPT
Kpo	Kpo	DEPT
London	London	DEPT
London	Kpo	DEPT
MAINT	Aksay	DEPT
MAINT	MAINT	DEPT
MAINT	Kpo	DEPT
PROD	Kpo	DEPT
PROD	Aksay	DEPT
PROD	Kpc	DEPT
PROD	PROD	DEPT
U2	Aksay	DEPT
U2	Kpo	DEPT
U2	U2	DEPT
U3	U3	DEPT
U3	Aksay	DEPT
U3	Kpo	DEPT
Uralsk	Kpo	DEPT
Uralsk	Uralsk	DEPT

Now we will do all our selections and updates using this table. for example, with the same phone numbers, it will look something like this:

Update EMPLOYEE SET “TELNUM” = ‘545-454’ FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = ‘AKSAY’ AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND EMPLOYEE.ISDPRT = ‘N’ ;

Yes, you will also need to provide a trigger to update the table when making a new or deleting an entry in the KPO table.

Continue,

Suppose that there is a table in which the logs are entered, imagine that you need to calculate how many entries for the previous month were made and grouped by day. for this you will need a reference calendar for the last month, for example so as not to miss the day when there were no entries. Here is a calendar (a list of days in one column) can be obtained by such a request.

WITH RECURSIVE t(n) AS ( VALUES (1) UNION ALL SELECT n+1 FROM t WHERE n < (SELECT cast (extract ('day' from date_trunc('month',current_date) - interval '24 hour') as integer)) ) SELECT to_date ( n || '-'||extract ('month' from date_trunc('month',current_date) - interval '24 hour') || '-'||extract ('year' from date_trunc('month',current_date) - interval '24 hour'), 'dd-mm-yyyy') FROM t; * This source code was highlighted with Source Code Highlighter.

Another not-so-good snack example. The original idea of Graeme Job (Graeme Job). the result is better to look in a text file.

WITH RECURSIVE z(ix, iy, cx, cy, x, y, i) AS ( SELECT ix, iy, x::float, y::float, x::float, y::float, 0 FROM (select -1.88+0.016*i, i from generate_series(0,150) as i) as xgen(x,ix), (select -1.11+0.060*i, i from generate_series(0,36) as i) as ygen(y,iy) UNION ALL SELECT ix, iy, cx, cy, x*x - y*y + cx, y*x*2 + cy, i+1 FROM z WHERE x * x + y * y < 16::float AND i < 27 ) SELECT array_to_string(array_agg(substring(' .,,,-----++++%%%%@@@@#### ', greatest(i,1), 1)),'') FROM ( SELECT ix, iy, max(i) AS i FROM z GROUP BY iy, ix ORDER BY iy, ix ) AS zt GROUP BY iy ORDER BY iy * This source code was highlighted with Source Code Highlighter.

Well, here's a post to start ;-)

Tags:

Recursive (Hierarchical) Queries in PostgreSQL

History

Theory

Practical examples

Also popular now: