Dialect of Oracle SQL: Model in examples. Part 1

This extension to the SELECT construct has become available with DBMS version 10g. MODEL allows you to access the sample data as multidimensional arrays, change and add elements, conduct complex aggregation, and also solve a number of tasks that previously required the use of PL / SQL. At the same time, language constructions remain readable and declarative. In a word - Excel-like, plus the entire load falls on the shoulders of the database server.
Syntax
MODEL [IGNORE NAV] [RETURN UPDATED ROWS]
[PARTITION BY (partition_column_1, ...)]
DIMENSION BY (dimension_column_1, ...)
MEASURES (measured_column_1, ...)
RULES [AUTOMATIC ORDER | ITERATE (value) [UNTIL (expression)]] (
rule_1, ...
);
The MODEL statement is processed among the latter, followed only by DISTINCT and ORDER BY. As a result of the application, the sample columns are mapped to the measured_column_ * arrays with dimension_column_ * dimensions . The optional PARTITION BY parameter defines partitions similar to those used in analytic functions (each of them is treated by rule_ * rules as an independent array). The rules are enumerated.
The simplest examples
To start, we will simulate a sample of numbers 1, 2, and 3:
SELECT *
FROM dual
MODEL DIMENSION BY (0 dimension)
MEASURES (dummy)
RULES (
dummy[5] = 1,
dummy[6] = 2,
dummy[7] = 3
);
DIMENSION R
---------- -
0 X
7 3
6 2
5 1
In this case, according to three rules, the dummy array with dimension dimension is filled . Alias 0 dimension is defined in order to add a new column. Let's analyze the transformation in more detail. The first step is to determine and display the selection columns ( 0 dimension in DIMENSION, dummy in MEASURES), then select the columns (return dummy = X, dimension = 0 ) and only after that the rules are executed. First, a string with dimension = 5 is searched , but since it is not found, a new one is created and dummy = 1 is filled, similarly for the remaining two rules. If necessary, using the RETURN UPDATED ROWS directive, you can display only updated lines:
SELECT result, dummy
FROM dual
MODEL RETURN UPDATED ROWS
DIMENSION BY (dummy)
MEASURES (0 result)
RULES(
result[5] = 1,
result[6] = 2,
result[7] = 3
);
RESULT DUMMY
---------- ------
3 7
2 6
1 5
You can also apply rule sets in loops. The following query calculates several elements of a Fibonacci sequence:
SELECT sequence
FROM dual
MODEL DIMENSION BY (0 dimension)
MEASURES (0 sequence)
RULES ITERATE (100500) UNTIL (sequence[iteration_number] > 10) (
sequence[iteration_number] =
CASE iteration_number
WHEN 0 THEN 0
WHEN 1 THEN 1
ELSE sequence[iteration_number - 2] + sequence[iteration_number - 1]
END
);
SEQUENCE
----------
0
1
1
2
3
5
8
13
ITERATE sets the number of iterations of the loop (starting from 0), and the optional UNTIL directive sets the condition for exiting it (which worked, judging by the fact that you have not yet pressed Ctrl + End). Access to the counter is via the iteration_number variable .
Ranges and Aggregation

- cnt [day <6, type LIKE 'latt%']
- cnt [day IN (3, 6), cv (type)]
- cnt [day BETWEEN 1 AND 16, regexp_like (type, '^. + (sso | tte) $')]
- cnt [2, 'black']
- cnt [7, 'latte']
Consider a table that stores information about coffee drunk during the week:
SELECT * FROM coffee;
TYPE CNT DAY
-------------------- ---------- ----------
turkish 1 1
espresso 1 1
turkish 2 2
black 1 2
espresso 1 2
latte 3 3
black 2 4
ice 1 4
Suppose we want to get a report on how many cups of coffee were drunk on each day and, in general, at the same time, we will take into account that on Thursday the black coffee was double. So the request is:
SELECT *
FROM coffee
MODEL DIMENSION BY (day, type)
MEASURES (cnt)
RULES (
cnt[4, 'black'] = cnt[cv(day), 'black'] * 2,
cnt[FOR day FROM 1 TO 4 INCREMENT 1, ' total for day'] = sum<(cnt)[cv(day), ANY],
cnt[NULL, 'GRAND TOTAL'] = sum(cnt)[ANY, ' total for day']
)
ORDER BY day, type DESC;
DAY TYPE CNT
--------- -------------------- ---------
1 turkish 1
1 espresso 1
1 total for day 2
2 turkish 2
2 espresso 1
2 black 1
2 total for day 4
3 latte 3
3 total for day 3
4 ice 1
4 black 4
4 total for day 5
GRAND TOTAL 14
We will analyze the rules in more detail. The first doubles the amount of coffee drunk on Thursday. The cv (dimension_name) function returns the current index value for the dimension_name dimension for the element being processed (i.e., in this case, instead of cv (day), you could specify day = 4 or, if desired, refer to the previous day as day = cv ( day) - 1). The second rule calculates subtotals from Monday to Thursday. In addition to the loop (which is already illustrative), references to the ranges of elements in the right-hand sides of the equalities are used here. Ranges can be specified in the same ways as checks in the WHERE clause; the additional keyword ANY is used to select any index values. Links to ranges on the right side of the equation must be aggregated, in this case, the sum function is used . And finally, the third rule considers the sum of the subtotal.
Moving on. Consider a query that displays the totals of a drink by type of coffee, the amount of latte drunk on Tuesday, as well as a message about whether the goal of the week has been reached - to drink espresso on Wednesday:
SELECT *
FROM coffee
MODEL DIMENSION BY (day, type)
MEASURES (cnt)
RULES (
cnt[NULL, FOR type IN (SELECT DISTINCT type FROM coffee)] = sum(cnt)[ANY, cv(type)],
cnt[NULL, 'GRAND TOTAL'] = sum(cnt)[NULL, ANY],
cnt[NULL, ' drank ' || cnt[2, 'latte'] || ' cups of latte on 2 day'] = NULL,
cnt[NULL, CASE
WHEN cnt[3, 'espresso'] IS PRESENT THEN ' ACHIEVED'
ELSE ' FAILED'
END || ': drank espresso on 3 day'] = NULL
)
ORDER BY day, type DESC;
DAY TYPE CNT
---------- ---------------------------------------- ----------
1 turkish 1
1 espresso 1
2 turkish 2
2 espresso 1
2 black 1
3 latte 3
4 ice 1
4 black 2
turkish 3
latte 3
ice 1
espresso 2
black 3
GRAND TOTAL 12
drank cups of latte on 2 day
FAILED: drank espresso on 3 day
The first rule uses a loop with iterations over sample values. Nested queries used inside MODEL must be uncorrelated. The following rule calculates the total amount. Pay attention to the line "drank cups of latte on 2 day". Because element cnt [2, 'latte'] was not found, we got NULL by reference. This behavior can be changed with the IGNORE NAV directive (added after the word MODEL), then instead of the elements not found and NULL, the calculations will be substituted: 0 for numbers, January 1, 2001 for dates, an empty string for string types and NULL for everything else. And finally, the fourth rule demonstrates the use of the expression IS PRESENT, it returns true if the specified element exists, but, alas, the espresso was not drunk on Wednesday.
On this introductory is completed, I hope you got a general idea and got used to the syntax. In the second part, we will talk about managing updating and creating elements, controlling MODEL output, issues of method applicability and its performance. And, of course, more complex examples will be discussed.
Read more