Counting brackets in Oracle SQL


It all started with the fact that on the site codeforces.ru in the next Codeforces Round I saw an interesting puzzle “Bracket sequence” and I did not want to solve it in an “uninteresting way”.

Briefly, the conditions of the problem are reduced to finding in the line consisting only of the characters “(”, “)”, “[” and “]” the correct staple sequence containing as many brackets as “[”.


To begin with, we turn our flat line into a more convenient hierarchical representation:

with brackets as
(select '[[)]([][][][[()[()[()()]]]])]' b 
   from dual) 
select substr(b,level,1) q,
        level     curr_pos,
        level - 1 prev_pos 
   from brackets
connect by substr(b,level,1) is not null

Each line will have another bracket, the position of the current bracket and the position of the previous bracket. You can already do something about this.

It remains to find all the correct bracket sequences. Here the fun begins.
Since to determine the correctness of the expression, we do not have a stack where brackets can be stored, we had to go to a “trick”.

Find all possible sequences starting with “(” or “[” (the rest are wrong by default). For each such sequence, we determine the position of the first and last brackets and the “nesting level” of the last bracket, that is:
"(" "1st - level"
     "(" "2nd - level"
      .. ... .. .. 
      "]" "2nd - level"
.. ... .. ...

  • The level increases if: the bracket opens, and in front of it there was nothing, or there was any opening;
  • The level does not change if the bracket is opened, and before it was closing, or if the bracket is closed, and before it was opening;
  • The level decreases if the bracket closes, and before it was closing;


with brackets as
(select '[[)]([][][][[()[()[()()]]]])]' b 
   from dual) , 
all_brackets as 
(select substr(b,level,1) q,
        level   curr_pos,
        level-1 prev_pos
   from brackets
connect by substr(b,level,1) is not null)
select replace(sys_connect_by_path(q,'-'), '-') str,      
              q,              
              connect_by_root(curr_pos) start_pos,
              curr_pos                  end_pos,
              sum( case
                   when (q = '(' or q = '[')
                    and (prior q is null)
                   then 1                        
                   when (q = '(' or q = '[')
                    and (prior q = '(' or prior q = '[')
                   then 1                          
                   when (q = '(' or q = '[')
                    and (prior q = ']' or prior q = ')')
                   then 0                           
                   when (q = ')' or q = ']') 
                    and (prior q = '(' or prior q = '[')
                   then 0                     
                   when (q = ')' or q = ']') 
                    and (prior q = ')' or prior q = ']')
                   then -1
                    end)
             over (partition by connect_by_root(curr_pos) 
                       order by connect_by_root(curr_pos), curr_pos) bracket_level              
         from all_brackets        
        connect by prior curr_pos = prev_pos
         start with q = '(' or q = '['

Using this data, we can determine the “pest” bracket that makes the correct sequence out of order. To do this, we will look through all the sequences obtained, and when the “parenthesis” appears, we will find the previous parenthesis that is at the same level as the current one (Thanks to Oracle for the analytic function lag). An incorrect sequence is obtained if one of the following situations occurs at one of the levels:
  • closed parenthesis closes
  • the parenthesis closes the square
  • square bracket closes round

Any opening parenthesis at this iteration cannot “break” the sequence.

with brackets as
(select '[[)]([][][][[()[()[()()]]]])]' b 
   from dual) , 
all_brackets as 
(select substr(b,level,1) q,
        level     curr_pos,
        level - 1 prev_pos 
   from brackets
connect by substr(b,level,1) is not null),
brackets_comb as 
(select replace(sys_connect_by_path(q,'-'), '-') str,      
        q,              
        connect_by_root(curr_pos) start_pos,
        curr_pos                  end_pos,
        sum( case
             when (q = '(' or q = '[')
              and (prior q is null)
             then 1                        
             when (q = '(' or q = '[')
              and (prior q = '(' or prior q = '[')
             then 1                          
             when (q = '(' or q = '[')
              and (prior q = ']' or prior q = ')')
             then 0                           
             when (q = ')' or q = ']') 
              and (prior q = '(' or prior q = '[')
             then 0                     
             when (q = ')' or q = ']') 
              and (prior q = ')' or prior q = ']')
             then -1
              end)
       over (partition by connect_by_root(curr_pos) 
                 order by connect_by_root(curr_pos), curr_pos)  bracket_level                            
   from all_brackets
  connect by prior curr_pos = prev_pos
   start with q = '(' or q = '[') 
select start_pos,
       end_pos, 
       str,
       case 
        when q = ')' 
         and lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) = '('
        then 'ok'
        when q = '(' 
         and lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) = ')'
        then 'ok'
        when q = '(' 
         and lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) = ']'
        then 'ok'          
        when q = ']' 
         and lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) = '['
        then 'ok'
        when q = '[' 
         and lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) = ']'
        then 'ok'
        when q = '[' 
         and lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) = ')'
        then 'ok'
        when lag(q) over (partition by start_pos, bracket_level 
                              order by start_pos, bracket_level, end_pos) is null
         and bracket_level > 0
        then 'ok'
        else 'not ok!'
       end status    
  from brackets_comb bc  

Thus, for each set of sequences starting at the same position (start_pos) and ordered by the number of brackets, we found the first incorrect sequence.
In each such set of sequences, “throw out” the sequence after the “wrong” one, and for all the remaining ones, we verify that the brackets that have opened are closed (for this it is enough to count the number of brackets that have opened and closed of each kind).
As a result, it remains to find that one, and maybe not the only one, it (no, not the girl of your dreams, as you might think), the bracket sequence with the maximum number of “[”.

All request:

with brackets as
(select '[[]])[([[]][[(]])]' b 
   from dual) , 
all_brackets as 
(select substr(b,level,1) q,
        level     curr_pos,
        level - 1 prev_pos 
   from brackets
connect by substr(b,level,1) is not null),
brackets_comb as 
(select replace(sys_connect_by_path(q,'-'), '-') str,      
        q,              
        connect_by_root(curr_pos) start_pos,
        curr_pos                  end_pos,
        sum( case
             when (q = '(' or q = '[')
              and (prior q is null)
             then 1                        
             when (q = '(' or q = '[')
              and (prior q = '(' or prior q = '[')
             then 1                          
             when (q = '(' or q = '[')
              and (prior q = ']' or prior q = ')')
             then 0                           
             when (q = ')' or q = ']') 
              and (prior q = '(' or prior q = '[')
             then 0                     
             when (q = ')' or q = ']') 
              and (prior q = ')' or prior q = ']')
             then -1
              end)
       over (partition by connect_by_root(curr_pos) 
                 order by connect_by_root(curr_pos), curr_pos) bracket_level              
   from all_brackets
connect by prior curr_pos = prev_pos
  start with q = '(' or q = '['),  
brackets_comb_status as        
(select start_pos,
        end_pos, 
        str,
        case 
         when q = ')' 
          and lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos) = '('
         then 'ok'
         when q = '(' 
          and lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos) = ')'
         then 'ok'
         when q = '(' 
          and lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos ) = ']'
         then 'ok'          
         when q = ']' 
          and lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos) = '['
         then 'ok'
         when q = '[' 
          and lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos) = ']'
         then 'ok'
         when q = '[' 
          and lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos) = ')'
         then 'ok'
         when lag(q) over (partition by start_pos, bracket_level 
                               order by start_pos, bracket_level, end_pos) is null
          and bracket_level > 0
         then 'ok'
         else 'not ok!'
        end status    
  from brackets_comb bc)
select str       "последовательность", 
       cnt       "колличество [", 
       start_pos "позиция с", 
       end_pos   "позиция по"
  from ( 
     select str,
            start_pos,
            end_pos,
            length(regexp_replace(str,'[\)\(]',''))/2 cnt,
            max(length(regexp_replace(str,'[\)\(]',''))/2) over (order by null) best_cnt            
       from ( 
        select str,
               start_pos,
               end_pos,         
               nvl(
                 lag(
                   case 
                     when status = 'ok' then null
                     else status
                   end
                   ignore nulls)
                  over (partition by start_pos order by start_pos, end_pos),
                 status
                ) status         
          from brackets_comb_status        
          )
      where status = 'ok' 
        and length(replace(str,'[','')) = length(replace(str,']',''))
        and length(replace(str,'(','')) = length(replace(str,')',''))     
    ) result
 where best_cnt = cnt

Also popular now: