Dialect of Oracle SQL: Model in examples. Part 1
Any developer who deals with the generation of reporting from databases regularly encounters the construction of bulky queries. Often this is due to database design errors, and, even more often, to the complexity of converting the extracted data. The latter include the use of iterative methods of calculation, calculation of subtotals for subgroups, calculations that use the values of adjacent sample rows, complex formatting of strings, and similar tasks. Such transformations are often carried out from the database level to the application server or client level, which affects the performance and ease of maintenance of the code. To solve these SQL problems, the Oracle DBMS provides analytical functions and the MODEL operator, which will be discussed in this article.
This extension to the SELECT construct has become available with DBMS version 10g. MODEL allows you to access the sample data as multidimensional arrays, change and add elements, conduct complex aggregation, and also solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructions remain readable and declarative. In a word - Excel-like, plus the entire load falls on the shoulders of the database server.
The MODEL statement is processed among the latter, followed only by DISTINCT and ORDER BY. As a result of the application, the sample columns are mapped to the measured_column_ * arrays with dimension_column_ * dimensions . The optional PARTITION BY parameter defines partitions similar to those used in analytic functions (each of them is treated by rule_ * rules as an independent array). The rules are enumerated.
To start, we will simulate a sample of numbers 1, 2, and 3:
In this case, according to three rules, the dummy array with dimension dimension is filled . Alias 0 dimension is defined in order to add a new column. Let's analyze the transformation in more detail. The first step is to determine and display the selection columns ( 0 dimension in DIMENSION, dummy in MEASURES), then select the columns (return dummy = X, dimension = 0 ) and only after that the rules are executed. First, a string with dimension = 5 is searched , but since it is not found, a new one is created and dummy = 1 is filled, similarly for the remaining two rules. If necessary, using the RETURN UPDATED ROWS directive, you can display only updated lines:
You can also apply rule sets in loops. The following query calculates several elements of a Fibonacci sequence:
ITERATE sets the number of iterations of the loop (starting from 0), and the optional UNTIL directive sets the condition for exiting it (which worked, judging by the fact that you have not yet pressed Ctrl + End). Access to the counter is via the iteration_number variable .
Above, we saw examples with the calculation of the values of individual elements. In addition, rules can be set for their groups, filtered by specified conditions. For this, the second type of links is used - symbolic (previously there were only positional). Indexes in symbolic links may contain validation conditions, for example:
Consider a table that stores information about coffee drunk during the week:
Suppose we want to get a report on how many cups of coffee were drunk on each day and, in general, at the same time, we will take into account that on Thursday the black coffee was double. So the request is:
We will analyze the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The cv (dimension_name) function returns the current index value for the dimension_name dimension for the element being processed (i.e., in this case, instead of cv (day), you could specify day = 4 or, if desired, refer to the previous day as day = cv ( day) - 1). The second rule calculates subtotals from Monday to Thursday. In addition to the loop (which is already illustrative), references to the ranges of elements in the right-hand sides of the equalities are used here. Ranges can be specified in the same ways as checks in the WHERE clause; the additional keyword ANY is used to select any index values. Links to ranges on the right side of the equation must be aggregated, in this case, the sum function is used . And finally, the third rule considers the sum of the subtotal.
Moving on. Consider a query that displays the totals of a drink by type of coffee, the amount of latte drunk on Tuesday, as well as a message about whether the goal of the week has been reached - to drink espresso on Wednesday:
The first rule uses a loop with iterations over sample values. Nested queries used inside MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because element cnt [2, 'latte'] was not found, we got NULL by reference. This behavior can be changed with the IGNORE NAV directive (added after the word MODEL), then instead of the elements not found and NULL, the calculations will be substituted: 0 for numbers, January 1, 2001 for dates, an empty string for string types and NULL for everything else. And finally, the fourth rule demonstrates the use of the expression IS PRESENT, it returns true if the specified element exists, but, alas, the espresso was not drunk on Wednesday.
On this introductory is completed, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing updating and creating elements, controlling MODEL output, issues of method applicability and its performance. And, of course, more complex examples will be discussed.
Read more
This extension to the SELECT construct has become available with DBMS version 10g. MODEL allows you to access the sample data as multidimensional arrays, change and add elements, conduct complex aggregation, and also solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructions remain readable and declarative. In a word - Excel-like, plus the entire load falls on the shoulders of the database server.
Syntax
MODEL [IGNORE NAV] [RETURN UPDATED ROWS]
[PARTITION BY (partition_column_1, ...)]
DIMENSION BY (dimension_column_1, ...)
MEASURES (measured_column_1, ...)
RULES [AUTOMATIC ORDER | ITERATE (value) [UNTIL (expression)]] (
rule_1, ...
);
The MODEL statement is processed among the latter, followed only by DISTINCT and ORDER BY. As a result of the application, the sample columns are mapped to the measured_column_ * arrays with dimension_column_ * dimensions . The optional PARTITION BY parameter defines partitions similar to those used in analytic functions (each of them is treated by rule_ * rules as an independent array). The rules are enumerated.
The simplest examples
To start, we will simulate a sample of numbers 1, 2, and 3:
SELECT *
FROM dual
MODEL DIMENSION BY (0 dimension)
MEASURES (dummy)
RULES (
dummy[5] = 1,
dummy[6] = 2,
dummy[7] = 3
);
DIMENSION R
---------- -
0 X
7 3
6 2
5 1
In this case, according to three rules, the dummy array with dimension dimension is filled . Alias 0 dimension is defined in order to add a new column. Let's analyze the transformation in more detail. The first step is to determine and display the selection columns ( 0 dimension in DIMENSION, dummy in MEASURES), then select the columns (return dummy = X, dimension = 0 ) and only after that the rules are executed. First, a string with dimension = 5 is searched , but since it is not found, a new one is created and dummy = 1 is filled, similarly for the remaining two rules. If necessary, using the RETURN UPDATED ROWS directive, you can display only updated lines:
SELECT result, dummy
FROM dual
MODEL RETURN UPDATED ROWS
DIMENSION BY (dummy)
MEASURES (0 result)
RULES(
result[5] = 1,
result[6] = 2,
result[7] = 3
);
RESULT DUMMY
---------- ------
3 7
2 6
1 5
You can also apply rule sets in loops. The following query calculates several elements of a Fibonacci sequence:
SELECT sequence
FROM dual
MODEL DIMENSION BY (0 dimension)
MEASURES (0 sequence)
RULES ITERATE (100500) UNTIL (sequence[iteration_number] > 10) (
sequence[iteration_number] =
CASE iteration_number
WHEN 0 THEN 0
WHEN 1 THEN 1
ELSE sequence[iteration_number - 2] + sequence[iteration_number - 1]
END
);
SEQUENCE
----------
0
1
1
2
3
5
8
13
ITERATE sets the number of iterations of the loop (starting from 0), and the optional UNTIL directive sets the condition for exiting it (which worked, judging by the fact that you have not yet pressed Ctrl + End). Access to the counter is via the iteration_number variable .
Ranges and Aggregation
Above, we saw examples with the calculation of the values of individual elements. In addition, rules can be set for their groups, filtered by specified conditions. For this, the second type of links is used - symbolic (previously there were only positional). Indexes in symbolic links may contain validation conditions, for example:
- cnt [day <6, type LIKE 'latt%']
- cnt [day IN (3, 6), cv (type)]
- cnt [day BETWEEN 1 AND 16, regexp_like (type, '^. + (sso | tte) $')]
- cnt [2, 'black']
- cnt [7, 'latte']
Consider a table that stores information about coffee drunk during the week:
SELECT * FROM coffee;
TYPE CNT DAY
-------------------- ---------- ----------
turkish 1 1
espresso 1 1
turkish 2 2
black 1 2
espresso 1 2
latte 3 3
black 2 4
ice 1 4
Suppose we want to get a report on how many cups of coffee were drunk on each day and, in general, at the same time, we will take into account that on Thursday the black coffee was double. So the request is:
SELECT *
FROM coffee
MODEL DIMENSION BY (day, type)
MEASURES (cnt)
RULES (
cnt[4, 'black'] = cnt[cv(day), 'black'] * 2,
cnt[FOR day FROM 1 TO 4 INCREMENT 1, ' total for day'] = sum<(cnt)[cv(day), ANY],
cnt[NULL, 'GRAND TOTAL'] = sum(cnt)[ANY, ' total for day']
)
ORDER BY day, type DESC;
DAY TYPE CNT
--------- -------------------- ---------
1 turkish 1
1 espresso 1
1 total for day 2
2 turkish 2
2 espresso 1
2 black 1
2 total for day 4
3 latte 3
3 total for day 3
4 ice 1
4 black 4
4 total for day 5
GRAND TOTAL 14
We will analyze the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The cv (dimension_name) function returns the current index value for the dimension_name dimension for the element being processed (i.e., in this case, instead of cv (day), you could specify day = 4 or, if desired, refer to the previous day as day = cv ( day) - 1). The second rule calculates subtotals from Monday to Thursday. In addition to the loop (which is already illustrative), references to the ranges of elements in the right-hand sides of the equalities are used here. Ranges can be specified in the same ways as checks in the WHERE clause; the additional keyword ANY is used to select any index values. Links to ranges on the right side of the equation must be aggregated, in this case, the sum function is used . And finally, the third rule considers the sum of the subtotal.
Moving on. Consider a query that displays the totals of a drink by type of coffee, the amount of latte drunk on Tuesday, as well as a message about whether the goal of the week has been reached - to drink espresso on Wednesday:
SELECT *
FROM coffee
MODEL DIMENSION BY (day, type)
MEASURES (cnt)
RULES (
cnt[NULL, FOR type IN (SELECT DISTINCT type FROM coffee)] = sum(cnt)[ANY, cv(type)],
cnt[NULL, 'GRAND TOTAL'] = sum(cnt)[NULL, ANY],
cnt[NULL, ' drank ' || cnt[2, 'latte'] || ' cups of latte on 2 day'] = NULL,
cnt[NULL, CASE
WHEN cnt[3, 'espresso'] IS PRESENT THEN ' ACHIEVED'
ELSE ' FAILED'
END || ': drank espresso on 3 day'] = NULL
)
ORDER BY day, type DESC;
DAY TYPE CNT
---------- ---------------------------------------- ----------
1 turkish 1
1 espresso 1
2 turkish 2
2 espresso 1
2 black 1
3 latte 3
4 ice 1
4 black 2
turkish 3
latte 3
ice 1
espresso 2
black 3
GRAND TOTAL 12
drank cups of latte on 2 day
FAILED: drank espresso on 3 day
The first rule uses a loop with iterations over sample values. Nested queries used inside MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because element cnt [2, 'latte'] was not found, we got NULL by reference. This behavior can be changed with the IGNORE NAV directive (added after the word MODEL), then instead of the elements not found and NULL, the calculations will be substituted: 0 for numbers, January 1, 2001 for dates, an empty string for string types and NULL for everything else. And finally, the fourth rule demonstrates the use of the expression IS PRESENT, it returns true if the specified element exists, but, alas, the espresso was not drunk on Wednesday.
On this introductory is completed, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing updating and creating elements, controlling MODEL output, issues of method applicability and its performance. And, of course, more complex examples will be discussed.
Read more