Dialect of Oracle SQL: Model in examples. Part 1

    imageAny developer who deals with the generation of reporting from databases regularly encounters the construction of bulky queries. Often this is due to database design errors, and, even more often, to the complexity of converting the extracted data. The latter include the use of iterative methods of calculation, calculation of subtotals for subgroups, calculations that use the values ​​of adjacent sample rows, complex formatting of strings, and similar tasks. Such transformations are often carried out from the database level to the application server or client level, which affects the performance and ease of maintenance of the code. To solve these SQL problems, the Oracle DBMS provides analytical functions and the MODEL operator, which will be discussed in this article.

    This extension to the SELECT construct has become available with DBMS version 10g. MODEL allows you to access the sample data as multidimensional arrays, change and add elements, conduct complex aggregation, and also solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructions remain readable and declarative. In a word - Excel-like, plus the entire load falls on the shoulders of the database server.

    Syntax


    MODEL [IGNORE NAV] [RETURN UPDATED ROWS]
        [PARTITION BY (partition_column_1, ...)]
        DIMENSION BY (dimension_column_1, ...)
        MEASURES (measured_column_1, ...)
        RULES [AUTOMATIC ORDER | ITERATE (value) [UNTIL (expression)]] (
            rule_1, ...
        );

    The MODEL statement is processed among the latter, followed only by DISTINCT and ORDER BY. As a result of the application, the sample columns are mapped to the measured_column_ * arrays with dimension_column_ * dimensions . The optional PARTITION BY parameter defines partitions similar to those used in analytic functions (each of them is treated by rule_ * rules as an independent array). The rules are enumerated.

    The simplest examples


    To start, we will simulate a sample of numbers 1, 2, and 3:

    SELECT *
    FROM dual
        MODEL DIMENSION BY (0 dimension)
        MEASURES (dummy)
        RULES (
            dummy[5] = 1,
            dummy[6] = 2,
            dummy[7] = 3
        );

     DIMENSION R
    ---------- -
             0 X
             7 3
             6 2
             5 1

    In this case, according to three rules, the dummy array with dimension dimension is filled . Alias 0 dimension is defined in order to add a new column. Let's analyze the transformation in more detail. The first step is to determine and display the selection columns ( 0 dimension in DIMENSION, dummy in MEASURES), then select the columns (return dummy = X, dimension = 0 ) and only after that the rules are executed. First, a string with dimension = 5 is searched , but since it is not found, a new one is created and dummy = 1 is filled, similarly for the remaining two rules. If necessary, using the RETURN UPDATED ROWS directive, you can display only updated lines:

    SELECT result, dummy
    FROM dual
    MODEL RETURN UPDATED ROWS
        DIMENSION BY (dummy)
        MEASURES (0 result)
        RULES(
            result[5] = 1,
            result[6] = 2,
            result[7] = 3
        );

        RESULT DUMMY
    ---------- ------
             3 7
             2 6
             1 5

    You can also apply rule sets in loops. The following query calculates several elements of a Fibonacci sequence:

    SELECT sequence
    FROM dual
    MODEL DIMENSION BY (0 dimension)
        MEASURES (0 sequence)
        RULES ITERATE (100500) UNTIL (sequence[iteration_number] > 10) (
            sequence[iteration_number] =
                CASE iteration_number
                    WHEN 0 THEN 0
                    WHEN 1 THEN 1
                    ELSE sequence[iteration_number - 2] + sequence[iteration_number - 1]
                END
        );

      SEQUENCE
    ----------
             0
             1
             1
             2
             3
             5
             8
            13

    ITERATE sets the number of iterations of the loop (starting from 0), and the optional UNTIL directive sets the condition for exiting it (which worked, judging by the fact that you have not yet pressed Ctrl + End). Access to the counter is via the iteration_number variable .

    Ranges and Aggregation


    imageAbove, we saw examples with the calculation of the values ​​of individual elements. In addition, rules can be set for their groups, filtered by specified conditions. For this, the second type of links is used - symbolic (previously there were only positional). Indexes in symbolic links may contain validation conditions, for example:
    • cnt [day <6, type LIKE 'latt%']
    • cnt [day IN (3, 6), cv (type)]
    • cnt [day BETWEEN 1 AND 16, regexp_like (type, '^. + (sso | tte) $')]
    In positional links, index values ​​are uniquely defined:
    • cnt [2, 'black']
    • cnt [7, 'latte']
    Using symbolic links, you cannot create new elements - only update existing ones (if you really want to, then of course you can, but more on that in the next part). Positional ones allow you to update and create. It is permissible to use any aggregate functions with ranges specified using symbolic links. Analytical functions inside the rules are prohibited.

    Consider a table that stores information about coffee drunk during the week:

    SELECT * FROM coffee;

    TYPE                        CNT        DAY
    -------------------- ---------- ----------
    turkish                       1          1
    espresso                      1          1
    turkish                       2          2
    black                         1          2
    espresso                      1          2
    latte                         3          3
    black                         2          4
    ice                           1          4

    Suppose we want to get a report on how many cups of coffee were drunk on each day and, in general, at the same time, we will take into account that on Thursday the black coffee was double. So the request is:

    SELECT *
    FROM coffee
    MODEL DIMENSION BY (day, type)
        MEASURES (cnt)
        RULES (
            cnt[4, 'black'] = cnt[cv(day), 'black'] * 2,
            cnt[FOR day FROM 1 TO 4 INCREMENT 1, ' total for day'] = sum<(cnt)[cv(day), ANY],
            cnt[NULL, 'GRAND TOTAL'] = sum(cnt)[ANY, ' total for day']
        )
    ORDER BY day, type DESC;

          DAY TYPE                       CNT
    --------- -------------------- ---------
            1 turkish                      1
            1 espresso                     1
            1  total for day               2
            2 turkish                      2
            2 espresso                     1
            2 black                        1
            2  total for day               4
            3 latte                        3
            3  total for day               3
            4 ice                          1
            4 black                        4
            4  total for day               5
              GRAND TOTAL                 14

    We will analyze the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The cv (dimension_name) function returns the current index value for the dimension_name dimension for the element being processed (i.e., in this case, instead of cv (day), you could specify day = 4 or, if desired, refer to the previous day as day = cv ( day) - 1). The second rule calculates subtotals from Monday to Thursday. In addition to the loop (which is already illustrative), references to the ranges of elements in the right-hand sides of the equalities are used here. Ranges can be specified in the same ways as checks in the WHERE clause; the additional keyword ANY is used to select any index values. Links to ranges on the right side of the equation must be aggregated, in this case, the sum function is used . And finally, the third rule considers the sum of the subtotal.

    Moving on. Consider a query that displays the totals of a drink by type of coffee, the amount of latte drunk on Tuesday, as well as a message about whether the goal of the week has been reached - to drink espresso on Wednesday:

    SELECT *
    FROM coffee
    MODEL DIMENSION BY (day, type)
        MEASURES (cnt)
        RULES (
            cnt[NULL, FOR type IN (SELECT DISTINCT type FROM coffee)] = sum(cnt)[ANY, cv(type)],
            cnt[NULL, 'GRAND TOTAL'] = sum(cnt)[NULL, ANY],
            cnt[NULL, ' drank ' || cnt[2, 'latte'] || ' cups of latte on 2 day'] = NULL,
            cnt[NULL, CASE
                          WHEN cnt[3, 'espresso'] IS PRESENT THEN ' ACHIEVED'
                          ELSE ' FAILED'
                      END || ': drank espresso on 3 day'] = NULL
        )
    ORDER BY day, type DESC;

           DAY TYPE                                            CNT
    ---------- ---------------------------------------- ----------
             1 turkish                                           1
             1 espresso                                          1
             2 turkish                                           2
             2 espresso                                          1
             2 black                                             1
             3 latte                                             3
             4 ice                                               1
             4 black                                             2
               turkish                                           3
               latte                                             3
               ice                                               1
               espresso                                          2
               black                                             3
               GRAND TOTAL                                      12
                 drank cups of latte on 2 day
                 FAILED: drank espresso on 3 day

    The first rule uses a loop with iterations over sample values. Nested queries used inside MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because element cnt [2, 'latte'] was not found, we got NULL by reference. This behavior can be changed with the IGNORE NAV directive (added after the word MODEL), then instead of the elements not found and NULL, the calculations will be substituted: 0 for numbers, January 1, 2001 for dates, an empty string for string types and NULL for everything else. And finally, the fourth rule demonstrates the use of the expression IS PRESENT, it returns true if the specified element exists, but, alas, the espresso was not drunk on Wednesday.

    On this introductory is completed, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing updating and creating elements, controlling MODEL output, issues of method applicability and its performance. And, of course, more complex examples will be discussed.

    Read more

    Literature

    1. Oracle® Database Data Warehousing Guide 11g Release 2 - Chapter 22 “SQL for Modeling”. HTML
    2. The SQL Model Clause of Oracle Database 10g. Pdf

    Also popular now: