Dependencies between SQL objects: using regular expressions and a small algorithmic focus

Original author: ASkaeff
  • Transfer

Introduction


Databases came to the world in the year 70. Since then, they have become larger in size, with more complex logic and continue to grow. Since then, many tools have been developed for parsing SQL and building dependencies. They all break SQL into tokens using one or another parser and grammar.

But ordinary parsers (ANTLR, for example) may encounter script parsing errors, each SQL dialect has its own peculiarities. Analysis time can also be significant on large databases.

I want to show a much simpler way: RegEx + a little trick,
so ...

The main "chip"


  1. Get single-file SQL script for all db objects
  2. Split an SQL script into text blocks using a regular expression that matches DDL statements
  3. Search for some text string among these text blocks
  4. ...
  5. PROFIT !!! 1

Implementation or how it works


1. Create a script for all SQL objects. Single file


Each RDBMS has the ability to generate SQL scripts drop / create. So, first we get a single SQL script for all database objects. Or we can take northwind.sql as an example.

2. Find all drop / create / alter DDL in the script using regular expressions


I use this expression

\b(create|alter)\s+(unique)*\s*(nonclustered)*\s*(procedure|proc|table|trigger|view|function|index)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*

FastColoredTextBox provides great regex tools, so we use

var result = range.GetRangesByLines(regexStr, RegexOptions.IgnoreCase);

Learn More About Regular Expressions Used
I wrote some useful regular expressions and put them together in one static class

public static class RegexValues
{
    public static string SqlCmdObjects = @"\[\$\(([^=<>\[\]\s\']+)\)\].\[[^=<>\s\']+\]";
    public static string SqlCmdObjectsShort = @"\$\(([^=<>\[\]\s\']+)\)";
    public static string DdlObjects = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s\[\$\(([^=<>\[\]\s\']+)\)\].\[[^=<>\s\']+\]";
    public static string DdlObjectsPrepared = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s\[([^=<>\[\]\s\']+)\].(\[[^=<>\s\']+\])";
    public static string DdlObjects_ = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s\$\(([^=<>\[\]\s\']+)\).[^=<>\s\']+";
    public static string DdlObjectsPrepared_ = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*";
    public static string DdlObjectsPreparedProcViewTrigger = @"\b(create)\s+(procedure|proc|trigger|view|function)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*";
    public static string DdlObjectsPreparedWithIndex = @"\b(create)\s+(unique)*\s*(nonclustered)*\s*(procedure|proc|table|trigger|view|function|index)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*";
    public static string DdlIndexAll = @"\b(create|alter)\s+(procedure|proc|table|trigger|view|function)\b\s\[([^=<>\[\]\s\']+)\].\[[^=<>\s\']+\]";
    public static string Variables = @"\@([^=<>\s\'\)\(\,]+)";
    public static string SqlCmdVariables = @"\:SETVAR\s+([a-zA-Z_]+)\s+([a-zA-Z_]+)";
}

  • SqlCmdObjects - finds strings of this kind [$ (some_var)]. [Obj_name] or [$ (some_var)]. [Obj_schema]. [Obj_name]
  • SqlCmdObjectsShort - matches $ (some_var)
  • DdlObjects - finds the same as SqlCmdObjects, but in conjunction with the CREATE PROC / TABLE / VIEW / FUNC / TRIGGER instructions
  • DdlObjectsPrepared is the same as DdlObjects, but the $ (sqlcmd) constraint has been removed
  • DdlObjects_ is the same as DdlObjects, but the bracket restriction has been removed []
  • DdlObjectsPrepared_ is the same as DdlObjectsPrepared, but the bracket constraint has been removed []
  • DdlObjectsPreparedProcViewTrigger - DDL statements only for proc, views, triggers and functions, i.e. only for “procedural” objects
  • DdlObjectsPreparedWithIndex - same as DdlObjectsPrepared_, but expanded by index matching
  • DdlIndexAll - same as DdlObjectsPrepared_, but ALTER statement is also taken into account
  • Variables - a regular expression that finds all SQL variables in a script
  • SqlCmdVariables - Finds CMD SQL variables (for example: SETVAR var1 val1)


The core of this software is actually the following lines of code:

var range = Syncronized(() => fastColoredTextBox1.Selection.Clone());
range.Normalize();
range.Start = new Place(0, 0);
range.End = Syncronized(() => new Place(tb.GetLineLength(tb.LinesCount - 1), tb.LinesCount - 1));

i.e., breaking the SQL file into ranges (Range).

We just load the SQL script into FastColoredTextBox (thanks to the authors!) And apply some regular expression to all its contents. As a result, we got the RESULT variable, which contains a list of the ranges found.

Range is a powerful class (thanks again to the author of FastColoredTextBox). The range contains the row number and column number of the found fragment in the SQL script. We just save the list of ranges in a virtual list (general method), and on SelectedIndexChanged we do the following

fastColoredTextBox1.Selection = range;
fastColoredTextBox1.DoSelectionVisible();

This provides us with convenient and quick navigation through the SQL file.

3. Find all occurrences of the SQL object (or arbitrary string) in the file.


We create another list of ranges (Range class from the FastColoredTextBox library), but instead of the regular expression string we put the name of some SQL object

4. The intersection of two lists of ranges - the same algorithmic focus.


The last step is to find occurrences from the second list of ranges in the first. This is done in the method.

private void FindUsage(string regex)

see class Form1.cs

That's all, the rest of the work is just plain .NET coding to put all the components together.

As a TOTAL + small HOWTO
1. find all create / alter statements with regex. And remember these line numbers.

2. Find all occurrences of the text (an object, for example, the name of a table) using a regular expression. And also remember the line numbers.

3. connect these two data sets, finding where the text line is located between the create / alter lines.

As a result, we have a simple GUI designer where you can:
  • Open SQL script or the entire folder (scripts will be combined into a temporary file and opened as a single script)
  • find all DDL statements (CREATE / ALTER) - a list will be created in the left panel with full navigation support.
  • select parts of text with the mouse (for example, some SQL object name)
  • right-click and select "Find Usage" - the list will be created in the right panel - these are SQL objects containing the selected text

Also, you can search not only by the names of SQL objects, but also virtually any part of the code, text, comment, etc.

Important Notice


Since this method of analyzing SQL dependencies is not based on metadata, you should always remember that you are combining two sets of data from strings. Thus, there are some restrictions or, better to say, “features” :)

Let's say we have a stored procedure

create proc test
as
declare @somevar int
create table #tmp(
colum_we_search nvarchar(255),
somevar int)

If we use the regex instruction that matches “CREATE TABLE”, we will get the #TMP table in the results of the match along with CREATE PROC.

Then, if we try to find "COLUMN_WE_SEARCH" - it will be found inside this temporary table #TMP, and not in the TEST procedure.

This can be beaten using a more precise initial regular expression. For such cases, I wrote the regex operator DdlObjectsPreparedProcViewTrigger (see above) - matching DDL statements only for proc, views, triggers and functions, i.e. only for SQL objects containing code.

Miscellaneous and Tasty


In the future, I would like to add MSAGL support for visualizing dependencies and for greater visibility!

FastColoredTextBox is a hi-end, full breakthrough! I did not expect that it has so many interesting features!

Sources
Compiled version (archive with exe file)

Also popular now: