Counting brackets in Oracle SQL
It all started with the fact that on the site codeforces.ru in the next Codeforces Round I saw an interesting puzzle “Bracket sequence” and I did not want to solve it in an “uninteresting way”.
Briefly, the conditions of the problem are reduced to finding in the line consisting only of the characters “(”, “)”, “[” and “]” the correct staple sequence containing as many brackets as “[”.
To begin with, we turn our flat line into a more convenient hierarchical representation:
with brackets as
(select '[[)]([][][][[()[()[()()]]]])]' b
from dual)
select substr(b,level,1) q,
level curr_pos,
level - 1 prev_pos
from brackets
connect by substr(b,level,1) is not null
Each line will have another bracket, the position of the current bracket and the position of the previous bracket. You can already do something about this.
It remains to find all the correct bracket sequences. Here the fun begins.
Since to determine the correctness of the expression, we do not have a stack where brackets can be stored, we had to go to a “trick”.
Find all possible sequences starting with “(” or “[” (the rest are wrong by default). For each such sequence, we determine the position of the first and last brackets and the “nesting level” of the last bracket, that is:
"(" "1st - level" "(" "2nd - level" .. ... .. .. "]" "2nd - level" .. ... .. ...
- The level increases if: the bracket opens, and in front of it there was nothing, or there was any opening;
- The level does not change if the bracket is opened, and before it was closing, or if the bracket is closed, and before it was opening;
- The level decreases if the bracket closes, and before it was closing;
with brackets as
(select '[[)]([][][][[()[()[()()]]]])]' b
from dual) ,
all_brackets as
(select substr(b,level,1) q,
level curr_pos,
level-1 prev_pos
from brackets
connect by substr(b,level,1) is not null)
select replace(sys_connect_by_path(q,'-'), '-') str,
q,
connect_by_root(curr_pos) start_pos,
curr_pos end_pos,
sum( case
when (q = '(' or q = '[')
and (prior q is null)
then 1
when (q = '(' or q = '[')
and (prior q = '(' or prior q = '[')
then 1
when (q = '(' or q = '[')
and (prior q = ']' or prior q = ')')
then 0
when (q = ')' or q = ']')
and (prior q = '(' or prior q = '[')
then 0
when (q = ')' or q = ']')
and (prior q = ')' or prior q = ']')
then -1
end)
over (partition by connect_by_root(curr_pos)
order by connect_by_root(curr_pos), curr_pos) bracket_level
from all_brackets
connect by prior curr_pos = prev_pos
start with q = '(' or q = '['
Using this data, we can determine the “pest” bracket that makes the correct sequence out of order. To do this, we will look through all the sequences obtained, and when the “parenthesis” appears, we will find the previous parenthesis that is at the same level as the current one (Thanks to Oracle for the analytic function lag). An incorrect sequence is obtained if one of the following situations occurs at one of the levels:
- closed parenthesis closes
- the parenthesis closes the square
- square bracket closes round
Any opening parenthesis at this iteration cannot “break” the sequence.
with brackets as
(select '[[)]([][][][[()[()[()()]]]])]' b
from dual) ,
all_brackets as
(select substr(b,level,1) q,
level curr_pos,
level - 1 prev_pos
from brackets
connect by substr(b,level,1) is not null),
brackets_comb as
(select replace(sys_connect_by_path(q,'-'), '-') str,
q,
connect_by_root(curr_pos) start_pos,
curr_pos end_pos,
sum( case
when (q = '(' or q = '[')
and (prior q is null)
then 1
when (q = '(' or q = '[')
and (prior q = '(' or prior q = '[')
then 1
when (q = '(' or q = '[')
and (prior q = ']' or prior q = ')')
then 0
when (q = ')' or q = ']')
and (prior q = '(' or prior q = '[')
then 0
when (q = ')' or q = ']')
and (prior q = ')' or prior q = ']')
then -1
end)
over (partition by connect_by_root(curr_pos)
order by connect_by_root(curr_pos), curr_pos) bracket_level
from all_brackets
connect by prior curr_pos = prev_pos
start with q = '(' or q = '[')
select start_pos,
end_pos,
str,
case
when q = ')'
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = '('
then 'ok'
when q = '('
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ')'
then 'ok'
when q = '('
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ']'
then 'ok'
when q = ']'
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = '['
then 'ok'
when q = '['
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ']'
then 'ok'
when q = '['
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ')'
then 'ok'
when lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) is null
and bracket_level > 0
then 'ok'
else 'not ok!'
end status
from brackets_comb bc
Thus, for each set of sequences starting at the same position (start_pos) and ordered by the number of brackets, we found the first incorrect sequence.
In each such set of sequences, “throw out” the sequence after the “wrong” one, and for all the remaining ones, we verify that the brackets that have opened are closed (for this it is enough to count the number of brackets that have opened and closed of each kind).
As a result, it remains to find that one, and maybe not the only one, it (no, not the girl of your dreams, as you might think), the bracket sequence with the maximum number of “[”.
All request:
with brackets as
(select '[[]])[([[]][[(]])]' b
from dual) ,
all_brackets as
(select substr(b,level,1) q,
level curr_pos,
level - 1 prev_pos
from brackets
connect by substr(b,level,1) is not null),
brackets_comb as
(select replace(sys_connect_by_path(q,'-'), '-') str,
q,
connect_by_root(curr_pos) start_pos,
curr_pos end_pos,
sum( case
when (q = '(' or q = '[')
and (prior q is null)
then 1
when (q = '(' or q = '[')
and (prior q = '(' or prior q = '[')
then 1
when (q = '(' or q = '[')
and (prior q = ']' or prior q = ')')
then 0
when (q = ')' or q = ']')
and (prior q = '(' or prior q = '[')
then 0
when (q = ')' or q = ']')
and (prior q = ')' or prior q = ']')
then -1
end)
over (partition by connect_by_root(curr_pos)
order by connect_by_root(curr_pos), curr_pos) bracket_level
from all_brackets
connect by prior curr_pos = prev_pos
start with q = '(' or q = '['),
brackets_comb_status as
(select start_pos,
end_pos,
str,
case
when q = ')'
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = '('
then 'ok'
when q = '('
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ')'
then 'ok'
when q = '('
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos ) = ']'
then 'ok'
when q = ']'
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = '['
then 'ok'
when q = '['
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ']'
then 'ok'
when q = '['
and lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) = ')'
then 'ok'
when lag(q) over (partition by start_pos, bracket_level
order by start_pos, bracket_level, end_pos) is null
and bracket_level > 0
then 'ok'
else 'not ok!'
end status
from brackets_comb bc)
select str "последовательность",
cnt "колличество [",
start_pos "позиция с",
end_pos "позиция по"
from (
select str,
start_pos,
end_pos,
length(regexp_replace(str,'[\)\(]',''))/2 cnt,
max(length(regexp_replace(str,'[\)\(]',''))/2) over (order by null) best_cnt
from (
select str,
start_pos,
end_pos,
nvl(
lag(
case
when status = 'ok' then null
else status
end
ignore nulls)
over (partition by start_pos order by start_pos, end_pos),
status
) status
from brackets_comb_status
)
where status = 'ok'
and length(replace(str,'[','')) = length(replace(str,']',''))
and length(replace(str,'(','')) = length(replace(str,')',''))
) result
where best_cnt = cnt