Recursive (Hierarchical) Queries in PostgreSQL

    Following Oracle with its 'connet by prior', all other DBMSs enter their hierarchical query (IZ) implementations. I would like to tell a wide audience how this is done in PostgreSQL.

    History


    It all started like this
    From: Evgen Potemkin
    Subject: Proposal of hierachical queries (a la Oracle)
    Date: 2002-11-14 11:52:28 GMT
    Hi there!
    I want to propose the patch for adding the hierarchical queries
    posibility. It allows to construct queries a la Oracle for ex:
    SELECT a, b FROM t CONNECT BY a PRIOR b START WITH cond; B
    I've seen this type of queries often made by adding a new type, which
    stores position of row in the tree. But sorting such tree are very
    tricky (i think).
    Patch allows result tree to be sorted, i.e. subnodes of each node will
    be sorted by ORDER BY clause.
    with regards, evgen


    then
    From: Tatsuo Ishii
    Subject: RFP: Recursive query in 8.4
    Date: 2008-02-19 08:36:00 GMT (1 year, 12 weeks, 6 days ago)
    Hi,
    As I promised before we would like to propose implementing the
    recursive query as defined in the standard for PostgreSQL 8.4.
    The work is supported by Sumitomo Electric Information Systems Co.,
    Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan
    (http://www.sraoss.co.jp).


    Well And since version 8.4 Postgresql began to support a recursive query.

    Theory



    FM in PostgreSQL is implemented on the basis of standard SQL clause WITH. Let's go a little deeper: a non-recursive WITH allows you to reduce the cost of repeated subqueries, divide a complex query into several smaller ones, is a convenient shortcut to call the subquery, so to speak, and is very convenient in terms of saving time when writing code. as in the example below, we managed to avoid the use of a subquery in WHERE due to the use of the temporary table top_regions formed specifically for this request.

    1. WITH regional_sales AS (
    2. SELECT region, SUM(amount) AS total_sales
    3. FROM orders
    4. GROUP BY region
    5. ), top_regions AS (
    6. SELECT region
    7. FROM regional_sales
    8. WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
    9. )
    10. SELECT region,
    11. product,
    12. SUM(quantity) AS product_units,
    13. SUM(amount) AS product_sales
    14. FROM orders
    15. WHERE region IN (SELECT region FROM top_regions)
    16. GROUP BY region, product;
    * This source code was highlighted with Source Code Highlighter.


    Adding an optional RECURSEVE statement allows a Postgre request to access its own output. The query algorithm must be composed of two parts. the first part is the base, usually returning one row with the starting point of the hierarchy or part of the hierarchy. That is, the place in the hierarchy from where the countdown will begin (for example, the root). and the second recursive part that will be associated with the time table that we declared after WITH. The first and second parts are combined with the UNION or UNION ALL operator. to put this chaotic set of words in order, let’s give an example.
    Let's create a table in which the structure of one company will be described after entering the data there:
    CREATE TABLE KPO
    (
    "ID" character varying(55),
    "DESCRIPTION" character varying(255),
    "PARENT" character varying(55
    ) ;



    Select * from kpo



    ID DESCRIPTIONPARENT
    == ====== =========================================
    Kpo KARACHAGANAK PETROLEUM OPERATING {null}
    Aksay Aksay Kpo
    UralskKpoKpo
    LondonLondonKpo
    KpcKpcAksay
    U2UNIT-2Aksay
    U3 UNIT-3Aksay
    PRODPRODACTIONKpc
    MAINTMAINTENANCE Aksay
    CMMSCMMS TEAMMAINT


    Now the recursive query itself:

    1. WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL ) AS (
    2. SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", CAST (T1."ID" AS VARCHAR (50)) as PATH, 1
    3.     FROM KPO T1 WHERE T1."PARENT" IS NULL
    4. union
    5. select T2."ID", T2."PARENT", T2."DESCRIPTION", CAST ( temp1.PATH ||'->'|| T2."ID" AS VARCHAR(50)) ,LEVEL + 1
    6.      FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT")      )
    7. select * from temp1 ORDER BY PATH LIMIT 100
    * This source code was highlighted with Source Code Highlighter.


    The first part (lines 2-3) returns to the temporary table the first row in this case, the root record of our structure, from which the countdown in our hierarchy will begin. the second part (rows 4-5) adds to the same temporary table entries associated with the row already in temp1 via JOIN (ID = PARENT) and so on until all the leaves of our ROOTa end up in temp1.
    Also in this example, the Orakolavska function sys_connect_by_path was simulated.



    "ID""PARENT"DESCRIPTION"Path""Level"
    Kpo KARACHAGANAK PETROLEUM OPERATINGKpo1
    AksayKpoAksayKPO-> AKSAY2
    KpcAksayKpcKPO-> AKSAY-> KPC3
    PRODKpcPRODAUCTIONKPO-> AKSAY-> KPC-> PROD4
    MAINTAksayMAINTENANCEKPO-> AKSAY-> MAINT3
    CMMSMAINTCMMS TEAMKPO-> AKSAY-> MAINT-> CMMS4
    U2AksayUNIT-2KPO-> AKSAY-> U23
    U3AksayUNIT-3KPO-> AKSAY-> U33
    LondonKpoLondonKPO-> LONDON2
    UralskKpoUralskKPO-> URALSK2


    There is no built-in loop check in Postgre, so if we received data from the guys who were directly involved in creating the structure in Excel, we should check this structure for integrity. sometimes it’s enough to use UNION instead of UNION ALL but this is only sometimes. if in the first part you set a starting point in the hierarchy and even if somewhere in the hierarchy there are breaks, in principle, if you run the aforementioned query, there will be no error, just the “nippers” lines will be ignored. But we need to know where the error is, and we can implement this by implementing additional verification before executing UNION.
    1. WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL, cycle ) AS (
    2. SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE
    3.     FROM KPO T1
    4. union all
    5. select T2."ID", T2."PARENT", T2."DESCRIPTION", cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 ,
    6.     T2."ID" = any (temp1.PATH)
    7.      FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE     )
    8.  
    9. select * from temp1 WHERE CYCLE IS TRUE LIMIT 100;
    * This source code was highlighted with Source Code Highlighter.


    Here, as you see, the same Path field is created, but all the previous parents are organized in an array, which makes it possible for us to compare each new “ID” to a duplicate well, and if the array already has such an entry, then the row is entered in the temporary table with the flag and in the next pass, we no longer use this line to search for descendants, thanks to this we avoid looping (union all ... WHERE ... AND NOT CYCLE).

    Advice from the official site: use the LIMIT operator as I did in the examples. it will help you save your nerves.

    Practical examples



    The theory is good, of course, but where all this can be put into practice, timbola in the light of the cost of such requests.
    Besides that it would be nice to draw a hierarchy in one query, and find for example all the leaves of the current “ID” there are also other tasks.
    Often you need to make changes, such as, for example, changing the phone code for all employees of a certain department in the telephone directory. Of course, you can use a column in the workers table or even make a prefix to “ID”. but all this makes the base not flexible and not scalable. It is much better to make a separate Workers table which will display a hierarchy with three columns “ID“, “PARENT“, “HIERARCHID”. the “HIERARCHID” field will allow you to make more than one hierarchical structure. For example, we will call the table ANCESTOR which will also consist of three columns “ID”, “ANCESTOR”, “HIERARCHID”. the “ANCESTOR" field will contain all the ancestors, remember the "Path" array from Example 3. so you can fill this table with just a recursive query.
    1. insert into ancestor ( “ID” "ANCESTOR", "HIERARCHID")
    2. WITH RECURSIVE temp1 ( "ID","PARENT","DESCRIPTION",PATH, LEVEL, cycle ) AS (
    3. SELECT T1."ID",T1."PARENT", T1."DESCRIPTION", cast (array[T1."ID"] as varchar(100)[]) , 1 , FALSE
    4.     FROM KPO T1 
    5. union all
    6. select T2."ID", T2."PARENT", T2."DESCRIPTION", cast(temp1.PATH || T2."ID" as varchar(100) []) ,LEVEL + 1 ,
    7.     T2."ID" = any (temp1.PATH)
    8.      FROM KPO T2 INNER JOIN temp1 ON( temp1."ID"= T2."PARENT") AND NOT CYCLE  
    9.        )
    10. select "ID" AS "LOCATION", PATH[1] AS "ANCESTOR" , 'DEPT' AS "DID" from temp1
    * This source code was highlighted with Source Code Highlighter.


    get such a tablet



    LOCATIONANCESTORHIERARCHID
    ===============================
    AksayAksayDEPT
    AksayKpoDEPT
    CMMSKpoDEPT
    CMMSMAINTDEPT
    CMMSCMMSDEPT
    CMMSAksayDEPT
    KpcAksayDEPT
    KpcKpcDEPT
    KpcKpoDEPT
    KpoKpoDEPT
    LondonLondonDEPT
    LondonKpoDEPT
    MAINTAksayDEPT
    MAINTMAINTDEPT
    MAINTKpoDEPT
    PRODKpoDEPT
    PRODAksayDEPT
    PRODKpcDEPT
    PRODPRODDEPT
    U2AksayDEPT
    U2KpoDEPT
    U2U2DEPT
    U3U3DEPT
    U3AksayDEPT
    U3KpoDEPT
    UralskKpoDEPT
    UralskUralskDEPT


    Now we will do all our selections and updates using this table. for example, with the same phone numbers, it will look something like this:

    Update EMPLOYEE SET “TELNUM” = ‘545-454’ FROM ANCESTOR WHERE “ANCESTOR”.”ANCESTOR” = ‘AKSAY’ AND EMPLOYEE.”LOCATION” = ANCESTOR.”LOCATION” AND EMPLOYEE.ISDPRT = ‘N’ ;

    Yes, you will also need to provide a trigger to update the table when making a new or deleting an entry in the KPO table.

    Continue,

    Suppose that there is a table in which the logs are entered, imagine that you need to calculate how many entries for the previous month were made and grouped by day. for this you will need a reference calendar for the last month, for example so as not to miss the day when there were no entries. Here is a calendar (a list of days in one column) can be obtained by such a request.
    1. WITH RECURSIVE t(n) AS (
    2. VALUES (1)
    3. UNION ALL
    4. SELECT n+1 FROM t WHERE n < (SELECT cast (extract ('day' from date_trunc('month',current_date) - interval '24 hour') as integer))
    5. )
    6. SELECT to_date ( n || '-'||extract ('month' from date_trunc('month',current_date) - interval '24 hour')
    7.    || '-'||extract ('year' from date_trunc('month',current_date) - interval '24 hour'), 'dd-mm-yyyy')
    8. FROM t;
    * This source code was highlighted with Source Code Highlighter.


    Another not-so-good snack example. The original idea of ​​Graeme Job (Graeme Job). the result is better to look in a text file.
    1. WITH RECURSIVE z(ix, iy, cx, cy, x, y, i) AS (
    2. SELECT ix, iy, x::float, y::float, x::float, y::float, 0
    3. FROM (select -1.88+0.016*i, i from generate_series(0,150) as i) as xgen(x,ix),
    4. (select -1.11+0.060*i, i from generate_series(0,36) as i) as ygen(y,iy)
    5. UNION ALL
    6. SELECT ix, iy, cx, cy, x*x - y*y + cx, y*x*2 + cy, i+1
    7. FROM z
    8. WHERE x * x + y * y < 16::float
    9. AND i < 27
    10. )
    11. SELECT array_to_string(array_agg(substring(' .,,,-----++++%%%%@@@@#### ',
    12. greatest(i,1), 1)),'')
    13. FROM (
    14. SELECT ix, iy, max(i) AS i
    15. FROM z
    16. GROUP BY iy, ix
    17. ORDER BY iy, ix
    18. ) AS zt
    19. GROUP BY iy
    20. ORDER BY iy
    * This source code was highlighted with Source Code Highlighter.


    Well, here's a post to start ;-)

    Also popular now: