Exploring Databases Using T-SQL
- Transfer
As a dba and SQL Server performance optimization consultant at Ambient Consulting, I often encounter the need to analyze performance bottlenecks on SQL Server instances that I see for the first time in my life. This can be a daunting task. As a rule, most companies do not have documentation on their databases. And if there is, then it is outdated, or its search takes several days.
In this article, I will share a basic set of scripts that dig up information about metadata using system functions, stored procedures, tables, dmv. Together, they reveal all the secrets of databases on the desired instance - their size, file location, their design, including columns, data types, default values, keys and indexes.
If you ever tried to get some of this information using the GUI, I think you will be pleasantly surprised by the amount of information that, with the help of these scripts, is obtained instantly.
As with any scripts, first test them in a test environment before running them in production. I would recommend you run them on MS testbases such as AdventureWorks or pubs .
Well, enough words, let me show you the scripts!
Firstly, some simple @@ Functions that will provide us with basic information.
How long has your SQL Server been running since the last restart? Remember that the tempdb system database is recreated every time you restart SQL Server. Here is one method to determine when the server was last restarted.
Linked servers are connections that allow SQL Server to access other data servers. Distributed requests can be run on different linked servers. It is useful to know if your database server is isolated from others, or if it is connected to other servers.
First, get a list of all the databases on the server. Remember that on any server there are four or five system databases (master, model, msdb, tempdb and distribution, if you use replication). You probably want to exclude these databases in the following queries. It is very simple to see a list of databases in SSMS, but these queries will be our “building blocks” for more complex queries.
There are several ways to get a list of all databases in T-SQL and below you will see some of them. Each method returns a similar result, but with some differences.
Stop! Before moving on, every good dba should find out if it has a fresh backup.
It will be better if you immediately find out the path to the file with the last backup.
It would be nice to understand which databases are currently used, especially if you are going to deal with performance problems.
Translator's note : this will only work in SQL Server 2012 and higher, in previous editions, dmv sys.dm_exec_sessions did not have a database_id column. To find out which databases the users are currently working on, you can use sp_who.
The sys.objects system table is one of the key to collecting information about the objects that make up your data model.
Below is a list of types of objects that we can obtain information about (see the documentation on sys.objects on MSDN )
Other directory views, such as sys.tables and sys.views, access sys.objects and provide information about a particular type of object. With these views, plus the OBJECTPROPERTY function, we can get a huge amount of information on each of the objects that make up our database schema.
The physical location of the selected database, including the main data file (mdf) and the transaction log file (ldf), can be obtained using these queries.
Of course, Object Explorer in SSMS shows a complete list of tables in the selected database, but some of the information using the GUI is more difficult to obtain than using scripts. The ANSI standard calls for INFORMATION_SCHEMA views, but they will not provide information about objects that are not part of the standard (such as triggers, extended procedures, etc.), so it’s better to use SQL Server catalog views.
If you don't know anything about a table, then all tables are equally important. The more you learn about tables, the more you divide them into conditionally more important and conditionally less important. In general, tables with a huge number of records often have a serious impact on performance.
In SSMS, we can right-click on any table, open the properties on the Storage tab and see the number of records in the table.
It is rather difficult to manually collect this information about all the tables. Again, if we write SELECT COUNT (*) FROM TABLENAME for each table, we will have to print a lot.
It is much more convenient to use T-SQL to generate a script. The script below will generate a set of T-SQL statements to get the number of rows in each table in the current database. Just execute it, copy the result to a new window and run.
Translator's note : the request did not work for me, I added a schema to the table name.
Sp_msforeachtable is an undocumented function that “goes through” all the tables in the database and executes the query, substituting '?' name of the current table. There is also a similar function sp_msforeachdb, working at the database level.
There are several problems with this undocumented function, for example, the use of special characters in object names. Those. if the name of the table or database contains a '-' sign, the stored procedure listed below will fail.
All previous methods used COUNT (*), which slowly works out if the table has more than 500K records.
The fastest way to get the number of records in a table is to get the number of records in a cluster index or heap. Remember that even though this method is the fastest, MS says that the information about the number of index entries and the actual number of rows in the table may not coincide, due to the fact that it takes a little time to update the information. In most cases, these values are either the same or very, very close and will soon become the same.
Working with heaps is like working with a flat file, instead of a database. If you want to be guaranteed to get a full table scan when executing any query, use heaps. I usually recommend adding a primary key to all heap tables.
When working on performance optimization, it is very important to know which tables are actively read and which are actively written. Earlier we found out how many records are in our tables, now we will see how often they write and read in them.
Remember that this information from dmv is cleared every time you restart SQL Server. The longer the server runs, the more reliable the statistics. I feel much more confident with statistics collected in 30 days than with statistics collected in a week.
A much more advanced version of this query is represented by a cursor that collects information on all tables of all databases on the server. In general, I'm not a fan of cursors because of their low performance, but moving around different databases is a great application for them.
Translator's note : the cursor will not work if you have databases in the list with a state other than ONLINE.
Views are, relatively speaking, queries stored in a database. You can think of them as virtual tables. Data is not stored in views, but in our queries we refer to them in the same way as tables.
In SQL Server, in some cases, we can update data using a view. To get a read-only view, you can use SELECT DISTINCT when you create it. Data “through” the view can only be changed if each row of the view corresponds to only one row in the “base” table. Any representation that does not meet this criterion, i.e. built on multiple tables, or using groupings, aggregate functions, and calculations, will be read-only.
Several times in my career I came across a situation where I could not understand which table the query was accessing. Submit a simple SELECT * FROM Client query. I am looking for a table called Client, but I cannot find it. Well, I think it must be this view, I’m looking for a view called Client and still can’t find it. Maybe I was wrong with the database? As a result, it turns out that Client is a synonym for customers and the table is actually called Customer. The marketing department wanted to refer to this table as Client and because of this a synonym was created. Fortunately, using synonyms is rare, but litigation can be difficult if you're not ready for it.
Stored procedures are a group of scripts that are compiled into a single execution plan. We can use catalog views to determine which KPs are created, what actions they perform and on which tables.
By adding a simple condition to WHERE, we can only get information about those stored procedures that, for example, perform INSERT operations.
By modifying the condition a bit in WHERE, we can collect information about the processors updating, deleting, or accessing certain tables.
Functions are stored in SQL Server, take some parameters and perform certain actions or calculations, and then return the result.
A trigger is a kind of stored procedure that is executed in response to certain actions with the table to which this trigger belongs. For example, we can create INSERT, UPDATE, and DELETE triggers.
CHECK constraints are a good way to implement business logic in a database. For example, some fields must be positive or negative, or the date in one column must be greater than the date in the other.
The queries presented in this section provide a means of almost reversing the engineering of an existing data model.
The following script describes tables and columns from the entire database. The result of this query can be copied to Excel, where you can configure filters and sorting and understand the types of data used in the database well. Also, pay attention to columns with the same name but different data types.
The default value is the value that will be saved if no value is specified for the column during insertion. Often, get_date () is set for columns storing the date. Also, the default values are used for auditing - system_user is inserted to determine the user account that performed a certain action.
Computed columns are columns whose values are calculated based on, as a rule, the values in other columns of the table.
IDENTITY columns are automatically populated with unique values by the system. Commonly used to store the record sequence number in a table.
As I wrote earlier, having a primary key and corresponding index on a table is one of the best practice. Another best practice is that foreign keys should also have an index built on the columns included in the foreign key. Foreign key indexes are great for joining tables. These indexes also have a good effect on performance when deleting records.
A script to search for all indexes in all tables of the current database.
Based on previously executed queries, SQL Server provides information about the missing indexes in the database, the creation of which can increase productivity.
Do not blindly add these indexes. I would think of each of the proposed indices. Using included columns, for example, can come in handy with a serious increase in volumes.
Foreign keys define the relationship between tables and are used to control referential integrity. In an entity-relationship diagram, lines between tables indicate foreign keys.
As I said, it is desirable to have an index built on the columns included in the foreign key. This greatly speeds up the joining of tables, which, usually, are still joined by a foreign key. These indexes also significantly speed up delete operations. If there is no such index, SQL Server will perform a table scan of the linked table, each time a record is deleted from the "first" table.
It depends ... I'm sure you have heard this expression before. I will consider three different methods for “reverse engineering” dependencies in the database. The first method is to use the sp_msdependecies stored procedure. The second is system tables associated with foreign keys. The third method is to use CTE.
Sp_msdependencies is an undocumented stored procedure that can be very useful for parsing complex table dependencies.
If we derive all the dependencies using sp_msdependencies, we get four columns: Type, ObjName, Owner (Schema), Sequence.
Pay attention to the sequence number (Sequence) - it starts from 1 and sequentially increases. Sequence is the "serial number" of a dependency.
I used this method several times when I needed to perform archiving or deletion on a very large database. If you know the dependencies of the table, then you have a “road map” - in what order do you need to archive or delete data. Start with the table with the highest value in the Sequence column and move from it in the reverse order - from larger to smaller. Tables with the same Sequence value can be deleted at the same time. This method does not violate any of the restrictions of foreign keys and allows you to transfer / delete records without temporarily deleting and rebuilding constraints.
In SSMS, if you right-click on the table name, you can select “View Dependencies” and “Objects that depend on TABLENAME”:
We can also get this information in the following way:
If in SSMS, in the window for viewing dependencies, select "Objects that depend on TABLENAME" and then expand all levels, we will see the following:
sp_msdependencies will return the same information.
Also, in SSMS, we can see which objects the selected table depends on.
The following query, using msdependencies, will return the same information.
If you want a list of table dependencies, you can use a temporary table to filter the dependencies by type.
The second method of "reverse engineering" dependencies in your database is to query the directory system representations associated with foreign keys.
The third method to get the dependency hierarchy is to use a recursive CTE.
Translator's note : all queries in the text (with the exception of one, it is marked in the text) will work on SQL Server 2005 SP3 and in later editions. The text is quite voluminous, I tried as much as I could to subtract it and find my mistakes (stylistic, syntactic, semantic and others), but I probably didn’t notice something, write me a PM, please, if something hurts the eye.
In this article, I will share a basic set of scripts that dig up information about metadata using system functions, stored procedures, tables, dmv. Together, they reveal all the secrets of databases on the desired instance - their size, file location, their design, including columns, data types, default values, keys and indexes.
If you ever tried to get some of this information using the GUI, I think you will be pleasantly surprised by the amount of information that, with the help of these scripts, is obtained instantly.
As with any scripts, first test them in a test environment before running them in production. I would recommend you run them on MS testbases such as AdventureWorks or pubs .
Well, enough words, let me show you the scripts!
We study the server
Let's start with queries that provide information about your servers.Basic information
Firstly, some simple @@ Functions that will provide us with basic information.
-- Имена сервера и экземпляра
Select @@SERVERNAME as [Server\Instance];
-- версия SQL Server
Select @@VERSION as SQLServerVersion;
-- экземпляр SQL Server
Select @@ServiceName AS ServiceInstance;
-- Текущая БД (БД, в контексте которой выполняется запрос)
Select DB_NAME() AS CurrentDB_Name;
How long has your SQL Server been running since the last restart? Remember that the tempdb system database is recreated every time you restart SQL Server. Here is one method to determine when the server was last restarted.
SELECT @@Servername AS ServerName ,
create_date AS ServerStarted ,
DATEDIFF(s, create_date, GETDATE()) / 86400.0 AS DaysRunning ,
DATEDIFF(s, create_date, GETDATE()) AS SecondsRunnig
FROM sys.databases
WHERE name = 'tempdb';
GO
Linked Servers
Linked servers are connections that allow SQL Server to access other data servers. Distributed requests can be run on different linked servers. It is useful to know if your database server is isolated from others, or if it is connected to other servers.
EXEC sp_helpserver;
--OR
EXEC sp_linkedservers;
--OR
SELECT @@SERVERNAME AS Server ,
Server_Id AS LinkedServerID ,
name AS LinkedServer ,
Product ,
Provider ,
Data_Source ,
Modify_Date
FROM sys.servers
ORDER BY name;
GO
List of all databases
First, get a list of all the databases on the server. Remember that on any server there are four or five system databases (master, model, msdb, tempdb and distribution, if you use replication). You probably want to exclude these databases in the following queries. It is very simple to see a list of databases in SSMS, but these queries will be our “building blocks” for more complex queries.
There are several ways to get a list of all databases in T-SQL and below you will see some of them. Each method returns a similar result, but with some differences.
EXEC sp_helpdb;
--OR
EXEC sp_Databases;
--OR
SELECT @@SERVERNAME AS Server ,
name AS DBName ,
recovery_model_Desc AS RecoveryModel ,
Compatibility_level AS CompatiblityLevel ,
create_date ,
state_desc
FROM sys.databases
ORDER BY Name;
--OR
SELECT @@SERVERNAME AS Server ,
d.name AS DBName ,
create_date ,
compatibility_level ,
m.physical_name AS FileName
FROM sys.databases d
JOIN sys.master_files m ON d.database_id = m.database_id
WHERE m.[type] = 0 -- data files only
ORDER BY d.name;
GO
Last backup?
Stop! Before moving on, every good dba should find out if it has a fresh backup.
SELECT @@Servername AS ServerName ,
d.Name AS DBName ,
MAX(b.backup_finish_date) AS LastBackupCompleted
FROM sys.databases d
LEFT OUTER JOIN msdb..backupset b
ON b.database_name = d.name
AND b.[type] = 'D'
GROUP BY d.Name
ORDER BY d.Name;
It will be better if you immediately find out the path to the file with the last backup.
SELECT @@Servername AS ServerName ,
d.Name AS DBName ,
b.Backup_finish_date ,
bmf.Physical_Device_name
FROM sys.databases d
INNER JOIN msdb..backupset b ON b.database_name = d.name
AND b.[type] = 'D'
INNER JOIN msdb.dbo.backupmediafamily bmf ON b.media_set_id = bmf.media_set_id
ORDER BY d.NAME ,
b.Backup_finish_date DESC;
GO
Active user connections
It would be nice to understand which databases are currently used, especially if you are going to deal with performance problems.
Translator's note : this will only work in SQL Server 2012 and higher, in previous editions, dmv sys.dm_exec_sessions did not have a database_id column. To find out which databases the users are currently working on, you can use sp_who.
-- Похожая информация, может быть получена с помощью sp_who
SELECT @@Servername AS Server ,
DB_NAME(database_id) AS DatabaseName ,
COUNT(database_id) AS Connections ,
Login_name AS LoginName ,
MIN(Login_Time) AS Login_Time ,
MIN(COALESCE(last_request_end_time, last_request_start_time))
AS Last_Batch
FROM sys.dm_exec_sessions
WHERE database_id > 0
AND DB_NAME(database_id) NOT IN ( 'master', 'msdb' )
GROUP BY database_id ,
login_name
ORDER BY DatabaseName;
We study databases
Let's take a deeper look and see how we can collect information about objects in all of your databases using various representations of the directory and dmv. Most of the queries presented in this section look “inside” only one database, so do not forget to select the desired database in SSMS or using the use database command. Also remember that you can always look in the context of which database the query will be executed using select db_name ().The sys.objects system table is one of the key to collecting information about the objects that make up your data model.
-- В примере U - таблицы
-- Попробуйте подставить другие значения type в WHERE
USE MyDatabase;
GO
SELECT *
FROM sys.objects
WHERE type = 'U';
Below is a list of types of objects that we can obtain information about (see the documentation on sys.objects on MSDN )
sys.objects.type
AF = statistical function (CLR environment);
C = CHECK restriction;
D = DEFAULT (restriction or isolated);
F = FOREIGN KEY constraint;
PK = PRIMARY KEY constraint;
P = SQL stored procedure;
PC = stored build procedure (CLR);
FN = scalar SQL function;
FS = scalar build function (CLR environment);
FT = table-return build function (CLR);
R = rule (old style, isolated);
RF = replication filter procedure;
S = system base table;
SN = synonym;
SQ = service queue;
TA = DML assembly trigger (CLR environment);
TR = DML SQL trigger;
IF = built-in table-valued SQL function;
TF = table-valued SQL function;
U = table (custom);
UQ = UNIQUE constraint;
V = representation;
X = extended stored procedure;
IT = internal table.
C = CHECK restriction;
D = DEFAULT (restriction or isolated);
F = FOREIGN KEY constraint;
PK = PRIMARY KEY constraint;
P = SQL stored procedure;
PC = stored build procedure (CLR);
FN = scalar SQL function;
FS = scalar build function (CLR environment);
FT = table-return build function (CLR);
R = rule (old style, isolated);
RF = replication filter procedure;
S = system base table;
SN = synonym;
SQ = service queue;
TA = DML assembly trigger (CLR environment);
TR = DML SQL trigger;
IF = built-in table-valued SQL function;
TF = table-valued SQL function;
U = table (custom);
UQ = UNIQUE constraint;
V = representation;
X = extended stored procedure;
IT = internal table.
Other directory views, such as sys.tables and sys.views, access sys.objects and provide information about a particular type of object. With these views, plus the OBJECTPROPERTY function, we can get a huge amount of information on each of the objects that make up our database schema.
Location of database files
The physical location of the selected database, including the main data file (mdf) and the transaction log file (ldf), can be obtained using these queries.
EXEC sp_Helpfile;
--OR
SELECT @@Servername AS Server ,
DB_NAME() AS DB_Name ,
File_id ,
Type_desc ,
Name ,
LEFT(Physical_Name, 1) AS Drive ,
Physical_Name ,
RIGHT(physical_name, 3) AS Ext ,
Size ,
Growth
FROM sys.database_files
ORDER BY File_id;
GO
Tables
Of course, Object Explorer in SSMS shows a complete list of tables in the selected database, but some of the information using the GUI is more difficult to obtain than using scripts. The ANSI standard calls for INFORMATION_SCHEMA views, but they will not provide information about objects that are not part of the standard (such as triggers, extended procedures, etc.), so it’s better to use SQL Server catalog views.
EXEC sp_tables; -- Помните, что этот метод вернёт и таблицы, и представления
--OR
SELECT @@Servername AS ServerName ,
TABLE_CATALOG ,
TABLE_SCHEMA ,
TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_NAME ;
--OR
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
o.name AS 'TableName' ,
o.[Type] ,
o.create_date
FROM sys.objects o
WHERE o.Type = 'U' -- User table
ORDER BY o.name;
--OR
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
t.Name AS TableName,
t.[Type],
t.create_date
FROM sys.tables t
ORDER BY t.Name;
GO
The number of entries in the table
If you don't know anything about a table, then all tables are equally important. The more you learn about tables, the more you divide them into conditionally more important and conditionally less important. In general, tables with a huge number of records often have a serious impact on performance.
In SSMS, we can right-click on any table, open the properties on the Storage tab and see the number of records in the table.
It is rather difficult to manually collect this information about all the tables. Again, if we write SELECT COUNT (*) FROM TABLENAME for each table, we will have to print a lot.
It is much more convenient to use T-SQL to generate a script. The script below will generate a set of T-SQL statements to get the number of rows in each table in the current database. Just execute it, copy the result to a new window and run.
SELECT 'Select ''' + DB_NAME() + '.' + SCHEMA_NAME(SCHEMA_ID) + '.'
+ LEFT(o.name, 128) + ''' as DBName, count(*) as Count From ' + o.name
+ ';' AS ' Script generator to get counts for all tables'
FROM sys.objects o
WHERE o.[type] = 'U'
ORDER BY o.name;
GO
Translator's note : the request did not work for me, I added a schema to the table name.
SELECT 'Select ''' + DB_NAME() + '.' + SCHEMA_NAME(SCHEMA_ID) + '.'
+ LEFT(o.name, 128) + ''' as DBName, count(*) as Count From ' + SCHEMA_NAME(SCHEMA_ID) + '.' + o.name
+ ';' AS ' Script generator to get counts for all tables'
FROM sys.objects o
WHERE o.[type] = 'U'
ORDER BY o.name;
sp_msForEachTable
Sp_msforeachtable is an undocumented function that “goes through” all the tables in the database and executes the query, substituting '?' name of the current table. There is also a similar function sp_msforeachdb, working at the database level.
There are several problems with this undocumented function, for example, the use of special characters in object names. Those. if the name of the table or database contains a '-' sign, the stored procedure listed below will fail.
CREATE TABLE #rowcount
( Tablename VARCHAR(128) ,
Rowcnt INT );
EXEC sp_MSforeachtable 'insert into #rowcount select ''?'', count(*) from ?'
SELECT *
FROM #rowcount
ORDER BY Tablename ,
Rowcnt;
DROP TABLE #rowcount;
The fastest way to get the number of records is a clustered index
All previous methods used COUNT (*), which slowly works out if the table has more than 500K records.
The fastest way to get the number of records in a table is to get the number of records in a cluster index or heap. Remember that even though this method is the fastest, MS says that the information about the number of index entries and the actual number of rows in the table may not coincide, due to the fact that it takes a little time to update the information. In most cases, these values are either the same or very, very close and will soon become the same.
-- Самый быстрый путь получения количества записей
-- Hint: получайте из индекса, а не таблицы
SELECT @@ServerName AS Server ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName ,
OBJECT_NAME(p.object_id) AS TableName ,
i.Type_Desc ,
i.Name AS IndexUsedForCounts ,
SUM(p.Rows) AS Rows
FROM sys.partitions p
JOIN sys.indexes i ON i.object_id = p.object_id
AND i.index_id = p.index_id
WHERE i.type_desc IN ( 'CLUSTERED', 'HEAP' )
-- This is key (1 index per table)
AND OBJECT_SCHEMA_NAME(p.object_id) <> 'sys'
GROUP BY p.object_id ,
i.type_desc ,
i.Name
ORDER BY SchemaName ,
TableName;
-- OR
-- Похожий метод получения количества записей, но с использованием DMV dm_db_partition_stats
SELECT @@ServerName AS ServerName ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(ddps.object_id) AS SchemaName ,
OBJECT_NAME(ddps.object_id) AS TableName ,
i.Type_Desc ,
i.Name AS IndexUsedForCounts ,
SUM(ddps.row_count) AS Rows
FROM sys.dm_db_partition_stats ddps
JOIN sys.indexes i ON i.object_id = ddps.object_id
AND i.index_id = ddps.index_id
WHERE i.type_desc IN ( 'CLUSTERED', 'HEAP' )
-- This is key (1 index per table)
AND OBJECT_SCHEMA_NAME(ddps.object_id) <> 'sys'
GROUP BY ddps.object_id ,
i.type_desc ,
i.Name
ORDER BY SchemaName ,
TableName;
GO
Search for heaps (tables without cluster indexes)
Working with heaps is like working with a flat file, instead of a database. If you want to be guaranteed to get a full table scan when executing any query, use heaps. I usually recommend adding a primary key to all heap tables.
-- Кучи (метод 1)
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
t.Name AS HeapTable ,
t.Create_Date
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
AND i.type_desc = 'HEAP'
ORDER BY t.Name
--OR
-- Кучи (Метод 2)
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
t.Name AS HeapTable ,
t.Create_Date
FROM sys.tables t
WHERE OBJECTPROPERTY(OBJECT_ID, 'TableHasClustIndex') = 0
ORDER BY t.Name;
--OR
-- Кучи (Метод 3) + количество записей
SELECT @@ServerName AS Server ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(ddps.object_id) AS SchemaName ,
OBJECT_NAME(ddps.object_id) AS TableName ,
i.Type_Desc ,
SUM(ddps.row_count) AS Rows
FROM sys.dm_db_partition_stats AS ddps
JOIN sys.indexes i ON i.object_id = ddps.object_id
AND i.index_id = ddps.index_id
WHERE i.type_desc = 'HEAP'
AND OBJECT_SCHEMA_NAME(ddps.object_id) <> 'sys'
GROUP BY ddps.object_id ,
i.type_desc
ORDER BY TableName;
We deal with the activity in the table
When working on performance optimization, it is very important to know which tables are actively read and which are actively written. Earlier we found out how many records are in our tables, now we will see how often they write and read in them.
Remember that this information from dmv is cleared every time you restart SQL Server. The longer the server runs, the more reliable the statistics. I feel much more confident with statistics collected in 30 days than with statistics collected in a week.
-- Чтение/запись таблицы
-- Кучи не рассматриваются, у них нет индексов
-- Только те таблицы, к которым обращались после запуска SQL Server
SELECT @@ServerName AS ServerName ,
DB_NAME() AS DBName ,
OBJECT_NAME(ddius.object_id) AS TableName ,
SUM(ddius.user_seeks + ddius.user_scans + ddius.user_lookups)
AS Reads ,
SUM(ddius.user_updates) AS Writes ,
SUM(ddius.user_seeks + ddius.user_scans + ddius.user_lookups
+ ddius.user_updates) AS [Reads&Writes] ,
( SELECT DATEDIFF(s, create_date, GETDATE()) / 86400.0
FROM master.sys.databases
WHERE name = 'tempdb'
) AS SampleDays ,
( SELECT DATEDIFF(s, create_date, GETDATE()) AS SecoundsRunnig
FROM master.sys.databases
WHERE name = 'tempdb'
) AS SampleSeconds
FROM sys.dm_db_index_usage_stats ddius
INNER JOIN sys.indexes i ON ddius.object_id = i.object_id
AND i.index_id = ddius.index_id
WHERE OBJECTPROPERTY(ddius.object_id, 'IsUserTable') = 1
AND ddius.database_id = DB_ID()
GROUP BY OBJECT_NAME(ddius.object_id)
ORDER BY [Reads&Writes] DESC;
GO
A much more advanced version of this query is represented by a cursor that collects information on all tables of all databases on the server. In general, I'm not a fan of cursors because of their low performance, but moving around different databases is a great application for them.
-- Операции чтения и записи
-- Кучи пропущены, у них нет индексов
-- Только таблицы, использовавшиеся после перезапуска SQL Server
-- В запросе используется курсор для получения информации во всех БД
-- Единый отчёт, хранится в tempdb
DECLARE DBNameCursor CURSOR
FOR
SELECT Name
FROM sys.databases
WHERE Name NOT IN ( 'master', 'model', 'msdb', 'tempdb',
'distribution' )
ORDER BY Name;
DECLARE @DBName NVARCHAR(128)
DECLARE @cmd VARCHAR(4000)
IF OBJECT_ID(N'tempdb..TempResults') IS NOT NULL
BEGIN
DROP TABLE tempdb..TempResults
END
CREATE TABLE tempdb..TempResults
(
ServerName NVARCHAR(128) ,
DBName NVARCHAR(128) ,
TableName NVARCHAR(128) ,
Reads INT ,
Writes INT ,
ReadsWrites INT ,
SampleDays DECIMAL(18, 8) ,
SampleSeconds INT
)
OPEN DBNameCursor
FETCH NEXT FROM DBNameCursor INTO @DBName
WHILE @@fetch_status = 0
BEGIN
----------------------------------------------------
-- Print @DBName
SELECT @cmd = 'Use ' + @DBName + '; '
SELECT @cmd = @cmd + ' Insert Into tempdb..TempResults
SELECT @@ServerName AS ServerName,
DB_NAME() AS DBName,
object_name(ddius.object_id) AS TableName ,
SUM(ddius.user_seeks
+ ddius.user_scans
+ ddius.user_lookups) AS Reads,
SUM(ddius.user_updates) as Writes,
SUM(ddius.user_seeks
+ ddius.user_scans
+ ddius.user_lookups
+ ddius.user_updates) as ReadsWrites,
(SELECT datediff(s,create_date, GETDATE()) / 86400.0
FROM sys.databases WHERE name = ''tempdb'') AS SampleDays,
(SELECT datediff(s,create_date, GETDATE())
FROM sys.databases WHERE name = ''tempdb'') as SampleSeconds
FROM sys.dm_db_index_usage_stats ddius
INNER JOIN sys.indexes i
ON ddius.object_id = i.object_id
AND i.index_id = ddius.index_id
WHERE objectproperty(ddius.object_id,''IsUserTable'') = 1 --True
AND ddius.database_id = db_id()
GROUP BY object_name(ddius.object_id)
ORDER BY ReadsWrites DESC;'
--PRINT @cmd
EXECUTE (@cmd)
-----------------------------------------------------
FETCH NEXT FROM DBNameCursor INTO @DBName
END
CLOSE DBNameCursor
DEALLOCATE DBNameCursor
SELECT *
FROM tempdb..TempResults
ORDER BY DBName ,
TableName;
--DROP TABLE tempdb..TempResults;
Translator's note : the cursor will not work if you have databases in the list with a state other than ONLINE.
Representation
Views are, relatively speaking, queries stored in a database. You can think of them as virtual tables. Data is not stored in views, but in our queries we refer to them in the same way as tables.
In SQL Server, in some cases, we can update data using a view. To get a read-only view, you can use SELECT DISTINCT when you create it. Data “through” the view can only be changed if each row of the view corresponds to only one row in the “base” table. Any representation that does not meet this criterion, i.e. built on multiple tables, or using groupings, aggregate functions, and calculations, will be read-only.
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
o.name AS ViewName ,
o.[Type] ,
o.create_date
FROM sys.objects o
WHERE o.[Type] = 'V' -- View
ORDER BY o.NAME
--OR
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
Name AS ViewName ,
create_date
FROM sys.Views
ORDER BY Name
--OR
SELECT @@Servername AS ServerName ,
TABLE_CATALOG ,
TABLE_SCHEMA ,
TABLE_NAME ,
TABLE_TYPE
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'VIEW'
ORDER BY TABLE_NAME
--OR
-- CREATE VIEW Code
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
o.name AS 'ViewName' ,
o.Type ,
o.create_date ,
sm.[DEFINITION] AS 'View script'
FROM sys.objects o
INNER JOIN sys.sql_modules sm ON o.object_id = sm.OBJECT_ID
WHERE o.Type = 'V' -- View
ORDER BY o.NAME;
GO
Synonyms
Several times in my career I came across a situation where I could not understand which table the query was accessing. Submit a simple SELECT * FROM Client query. I am looking for a table called Client, but I cannot find it. Well, I think it must be this view, I’m looking for a view called Client and still can’t find it. Maybe I was wrong with the database? As a result, it turns out that Client is a synonym for customers and the table is actually called Customer. The marketing department wanted to refer to this table as Client and because of this a synonym was created. Fortunately, using synonyms is rare, but litigation can be difficult if you're not ready for it.
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
o.name AS ViewName ,
o.Type ,
o.create_date
FROM sys.objects o
WHERE o.[Type] = 'SN' -- Synonym
ORDER BY o.NAME;
--OR
-- дополнительная информация о синонимах
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
s.name AS synonyms ,
s.create_date ,
s.base_object_name
FROM sys.synonyms s
ORDER BY s.name;
GO
Stored procedures
Stored procedures are a group of scripts that are compiled into a single execution plan. We can use catalog views to determine which KPs are created, what actions they perform and on which tables.
-- Хранимые процедуры
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
o.name AS StoredProcedureName ,
o.[Type] ,
o.create_date
FROM sys.objects o
WHERE o.[Type] = 'P' -- Stored Procedures
ORDER BY o.name
--OR
-- Дополнительная информация о ХП
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
o.name AS 'ViewName' ,
o.[type] ,
o.Create_date ,
sm.[definition] AS 'Stored Procedure script'
FROM sys.objects o
INNER JOIN sys.sql_modules sm ON o.object_id = sm.object_id
WHERE o.[type] = 'P' -- Stored Procedures
-- AND sm.[definition] LIKE '%insert%'
-- AND sm.[definition] LIKE '%update%'
-- AND sm.[definition] LIKE '%delete%'
-- AND sm.[definition] LIKE '%tablename%'
ORDER BY o.name;
GO
By adding a simple condition to WHERE, we can only get information about those stored procedures that, for example, perform INSERT operations.
WHERE o.[type] = 'P' -- Stored Procedures
AND sm.definition LIKE '%insert%'
ORDER BY o.name
…
By modifying the condition a bit in WHERE, we can collect information about the processors updating, deleting, or accessing certain tables.
Functions
Functions are stored in SQL Server, take some parameters and perform certain actions or calculations, and then return the result.
-- Функции
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
o.name AS 'Functions' ,
o.[Type] ,
o.create_date
FROM sys.objects o
WHERE o.Type = 'FN' -- Function
ORDER BY o.NAME;
--OR
-- Дополнительная информация о функциях
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
o.name AS 'FunctionName' ,
o.[type] ,
o.create_date ,
sm.[DEFINITION] AS 'Function script'
FROM sys.objects o
INNER JOIN sys.sql_modules sm ON o.object_id = sm.OBJECT_ID
WHERE o.[Type] = 'FN' -- Function
ORDER BY o.NAME;
GO
Triggers
A trigger is a kind of stored procedure that is executed in response to certain actions with the table to which this trigger belongs. For example, we can create INSERT, UPDATE, and DELETE triggers.
-- Триггеры
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
parent.name AS TableName ,
o.name AS TriggerName ,
o.[Type] ,
o.create_date
FROM sys.objects o
INNER JOIN sys.objects parent ON o.parent_object_id = parent.object_id
WHERE o.Type = 'TR' -- Triggers
ORDER BY parent.name ,
o.NAME
--OR
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
Parent_id ,
name AS TriggerName ,
create_date
FROM sys.triggers
WHERE parent_class = 1
ORDER BY name;
--OR
-- Дополнительная информация о триггерах
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
OBJECT_NAME(Parent_object_id) AS TableName ,
o.name AS 'TriggerName' ,
o.Type ,
o.create_date ,
sm.[DEFINITION] AS 'Trigger script'
FROM sys.objects o
INNER JOIN sys.sql_modules sm ON o.object_id = sm.OBJECT_ID
WHERE o.Type = 'TR' -- Triggers
ORDER BY o.NAME;
GO
CHECK restrictions
CHECK constraints are a good way to implement business logic in a database. For example, some fields must be positive or negative, or the date in one column must be greater than the date in the other.
-- Check Constraints
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
parent.name AS 'TableName' ,
o.name AS 'Constraints' ,
o.[Type] ,
o.create_date
FROM sys.objects o
INNER JOIN sys.objects parent
ON o.parent_object_id = parent.object_id
WHERE o.Type = 'C' -- Check Constraints
ORDER BY parent.name ,
o.name
--OR
--CHECK constriant definitions
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(parent_object_id) AS SchemaName ,
OBJECT_NAME(parent_object_id) AS TableName ,
parent_column_id AS Column_NBR ,
Name AS CheckConstraintName ,
type ,
type_desc ,
create_date ,
OBJECT_DEFINITION(object_id) AS CheckConstraintDefinition
FROM sys.Check_constraints
ORDER BY TableName ,
SchemaName ,
Column_NBR
GO
Going deeper into the data model
Earlier, we used scripts that gave us an idea of the “top level” of the objects that make up our database. Sometimes we need to get more data about a table, including columns, their data types, what default values are given, which keys and indexes exist (or should exist), etc.The queries presented in this section provide a means of almost reversing the engineering of an existing data model.
Columns
The following script describes tables and columns from the entire database. The result of this query can be copied to Excel, where you can configure filters and sorting and understand the types of data used in the database well. Also, pay attention to columns with the same name but different data types.
SELECT @@Servername AS Server ,
DB_NAME() AS DBName ,
isc.Table_Name AS TableName ,
isc.Table_Schema AS SchemaName ,
Ordinal_Position AS Ord ,
Column_Name ,
Data_Type ,
Numeric_Precision AS Prec ,
Numeric_Scale AS Scale ,
Character_Maximum_Length AS LEN , -- -1 means MAX like Varchar(MAX)
Is_Nullable ,
Column_Default ,
Table_Type
FROM INFORMATION_SCHEMA.COLUMNS isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
-- WHERE Table_Type = 'BASE TABLE' -- 'Base Table' or 'View'
ORDER BY DBName ,
TableName ,
SchemaName ,
Ordinal_position;
-- Имена столбцов и количество повторов
-- Используется для поиска одноимённых столбцов с разными типами данных/длиной
SELECT @@Servername AS Server ,
DB_NAME() AS DBName ,
Column_Name ,
Data_Type ,
Numeric_Precision AS Prec ,
Numeric_Scale AS Scale ,
Character_Maximum_Length ,
COUNT(*) AS Count
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE Table_type = 'BASE TABLE'
GROUP BY Column_Name ,
Data_Type ,
Numeric_Precision ,
Numeric_Scale ,
Character_Maximum_Length;
-- Информация по используемым типам данных
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
Data_Type ,
Numeric_Precision AS Prec ,
Numeric_Scale AS Scale ,
Character_Maximum_Length AS [Length] ,
COUNT(*) AS COUNT
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE Table_type = 'BASE TABLE'
GROUP BY Data_Type ,
Numeric_Precision ,
Numeric_Scale ,
Character_Maximum_Length
ORDER BY Data_Type ,
Numeric_Precision ,
Numeric_Scale ,
Character_Maximum_Length
-- Large object data types or Binary Large Objects(BLOBs)
-- Помните, что индексы по этим таблицам не могут быть перестроены в режиме "online"
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
isc.Table_Name ,
Ordinal_Position AS Ord ,
Column_Name ,
Data_Type AS BLOB_Data_Type ,
Numeric_Precision AS Prec ,
Numeric_Scale AS Scale ,
Character_Maximum_Length AS [Length]
FROM information_schema.columns isc
INNER JOIN information_schema.tables ist
ON isc.table_name = ist.table_name
WHERE Table_type = 'BASE TABLE'
AND ( Data_Type IN ( 'text', 'ntext', 'image', 'XML' )
OR ( Data_Type IN ( 'varchar', 'nvarchar', 'varbinary' )
AND Character_Maximum_Length = -1
)
) -- varchar(max), nvarchar(max), varbinary(max)
ORDER BY isc.Table_Name ,
Ordinal_position;
Default values
The default value is the value that will be saved if no value is specified for the column during insertion. Often, get_date () is set for columns storing the date. Also, the default values are used for auditing - system_user is inserted to determine the user account that performed a certain action.
-- Table Defaults
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
parent.name AS TableName ,
o.name AS Defaults ,
o.[Type] ,
o.Create_date
FROM sys.objects o
INNER JOIN sys.objects parent
ON o.parent_object_id = parent.object_id
WHERE o.[Type] = 'D' -- Defaults
ORDER BY parent.name ,
o.NAME
--OR
-- Column Defaults
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
OBJECT_SCHEMA_NAME(parent_object_id) AS SchemaName ,
OBJECT_NAME(parent_object_id) AS TableName ,
parent_column_id AS Column_NBR ,
Name AS DefaultName ,
[type] ,
type_desc ,
create_date ,
OBJECT_DEFINITION(object_id) AS Defaults
FROM sys.default_constraints
ORDER BY TableName ,
Column_NBR
--OR
-- Column Defaults
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName ,
t.Name AS TableName ,
c.Column_ID AS Ord ,
c.Name AS Column_Name ,
OBJECT_NAME(default_object_id) AS DefaultName ,
OBJECT_DEFINITION(default_object_id) AS Defaults
FROM sys.Tables t
INNER JOIN sys.columns c ON t.object_id = c.object_id
WHERE default_object_id <> 0
ORDER BY TableName ,
SchemaName ,
c.Column_ID
GO
Computed Columns
Computed columns are columns whose values are calculated based on, as a rule, the values in other columns of the table.
-- Вычисляемые столбцы
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(object_id) AS SchemaName ,
OBJECT_NAME(object_id) AS Tablename ,
Column_id ,
Name AS Computed_Column ,
[Definition] ,
is_persisted
FROM sys.computed_columns
ORDER BY SchemaName ,
Tablename ,
[Definition];
--Or
-- Computed Columns
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(t.object_id) AS SchemaName,
t.Name AS TableName ,
c.Column_ID AS Ord ,
c.Name AS Computed_Column
FROM sys.Tables t
INNER JOIN sys.Columns c ON t.object_id = c.object_id
WHERE is_computed = 1
ORDER BY t.Name ,
SchemaName ,
c.Column_ID
GO
Identity columns
IDENTITY columns are automatically populated with unique values by the system. Commonly used to store the record sequence number in a table.
SELECT @@Servername AS ServerName ,
DB_NAME() AS DBName ,
OBJECT_SCHEMA_NAME(object_id) AS SchemaName ,
OBJECT_NAME(object_id) AS TableName ,
Column_id ,
Name AS IdentityColumn ,
Seed_Value ,
Last_Value
FROM sys.identity_columns
ORDER BY SchemaName ,
TableName ,
Column_id;
GO
Keys and Indexes
As I wrote earlier, having a primary key and corresponding index on a table is one of the best practice. Another best practice is that foreign keys should also have an index built on the columns included in the foreign key. Foreign key indexes are great for joining tables. These indexes also have a good effect on performance when deleting records.
What indexes do we have?
A script to search for all indexes in all tables of the current database.
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
o.Name AS TableName ,
i.Name AS IndexName
FROM sys.objects o
INNER JOIN sys.indexes i ON o.object_id = i.object_id
WHERE o.Type = 'U' -- User table
AND LEFT(i.Name, 1) <> '_' -- Remove hypothetical indexes
ORDER BY o.NAME ,
i.name;
GO
What indexes are missing?
Based on previously executed queries, SQL Server provides information about the missing indexes in the database, the creation of which can increase productivity.
Do not blindly add these indexes. I would think of each of the proposed indices. Using included columns, for example, can come in handy with a serious increase in volumes.
-- Отсутствующие индексы из DMV
SELECT @@ServerName AS ServerName ,
DB_NAME() AS DBName ,
t.name AS 'Affected_table' ,
( LEN(ISNULL(ddmid.equality_columns, N'')
+ CASE WHEN ddmid.equality_columns IS NOT NULL
AND ddmid.inequality_columns IS NOT NULL THEN ','
ELSE ''
END) - LEN(REPLACE(ISNULL(ddmid.equality_columns, N'')
+ CASE WHEN ddmid.equality_columns
IS NOT NULL
AND ddmid.inequality_columns
IS NOT NULL
THEN ','
ELSE ''
END, ',', '')) ) + 1 AS K ,
COALESCE(ddmid.equality_columns, '')
+ CASE WHEN ddmid.equality_columns IS NOT NULL
AND ddmid.inequality_columns IS NOT NULL THEN ','
ELSE ''
END + COALESCE(ddmid.inequality_columns, '') AS Keys ,
COALESCE(ddmid.included_columns, '') AS [include] ,
'Create NonClustered Index IX_' + t.name + '_missing_'
+ CAST(ddmid.index_handle AS VARCHAR(20))
+ ' On ' + ddmid.[statement] COLLATE database_default
+ ' (' + ISNULL(ddmid.equality_columns, '')
+ CASE WHEN ddmid.equality_columns IS NOT NULL
AND ddmid.inequality_columns IS NOT NULL THEN ','
ELSE ''
END + ISNULL(ddmid.inequality_columns, '') + ')'
+ ISNULL(' Include (' + ddmid.included_columns + ');', ';')
AS sql_statement ,
ddmigs.user_seeks ,
ddmigs.user_scans ,
CAST(( ddmigs.user_seeks + ddmigs.user_scans )
* ddmigs.avg_user_impact AS BIGINT) AS 'est_impact' ,
avg_user_impact ,
ddmigs.last_user_seek ,
( SELECT DATEDIFF(Second, create_date, GETDATE()) Seconds
FROM sys.databases
WHERE name = 'tempdb'
) SecondsUptime
FROM sys.dm_db_missing_index_groups ddmig
INNER JOIN sys.dm_db_missing_index_group_stats ddmigs
ON ddmigs.group_handle = ddmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details ddmid
ON ddmig.index_handle = ddmid.index_handle
INNER JOIN sys.tables t ON ddmid.OBJECT_ID = t.OBJECT_ID
WHERE ddmid.database_id = DB_ID()
ORDER BY est_impact DESC;
GO
Foreign keys
Foreign keys define the relationship between tables and are used to control referential integrity. In an entity-relationship diagram, lines between tables indicate foreign keys.
-- Foreign Keys
SELECT @@Servername AS ServerName ,
DB_NAME() AS DB_Name ,
parent.name AS 'TableName' ,
o.name AS 'ForeignKey' ,
o.[Type] ,
o.Create_date
FROM sys.objects o
INNER JOIN sys.objects parent ON o.parent_object_id = parent.object_id
WHERE o.[Type] = 'F' -- Foreign Keys
ORDER BY parent.name ,
o.name
--OR
SELECT f.name AS ForeignKey ,
SCHEMA_NAME(f.SCHEMA_ID) AS SchemaName ,
OBJECT_NAME(f.parent_object_id) AS TableName ,
COL_NAME(fc.parent_object_id, fc.parent_column_id) AS ColumnName ,
SCHEMA_NAME(o.SCHEMA_ID) ReferenceSchemaName ,
OBJECT_NAME(f.referenced_object_id) AS ReferenceTableName ,
COL_NAME(fc.referenced_object_id, fc.referenced_column_id)
AS ReferenceColumnName
FROM sys.foreign_keys AS f
INNER JOIN sys.foreign_key_columns AS fc
ON f.OBJECT_ID = fc.constraint_object_id
INNER JOIN sys.objects AS o ON o.OBJECT_ID = fc.referenced_object_id
ORDER BY TableName ,
ReferenceTableName;
GO
Missing Foreign Key Indexes
As I said, it is desirable to have an index built on the columns included in the foreign key. This greatly speeds up the joining of tables, which, usually, are still joined by a foreign key. These indexes also significantly speed up delete operations. If there is no such index, SQL Server will perform a table scan of the linked table, each time a record is deleted from the "first" table.
-- Foreign Keys missing indexes
-- Помните, что этот скрипт работает только для создания индексов по одному столбцу
-- Внешние ключи, состоящие более чем из одного столбца, не отслеживаются
SELECT DB_NAME() AS DBName ,
rc.Constraint_Name AS FK_Constraint ,
-- rc.Constraint_Catalog AS FK_Database,
-- rc.Constraint_Schema AS FKSch,
ccu.Table_Name AS FK_Table ,
ccu.Column_Name AS FK_Column ,
ccu2.Table_Name AS ParentTable ,
ccu2.Column_Name AS ParentColumn ,
I.Name AS IndexName ,
CASE WHEN I.Name IS NULL
THEN 'IF NOT EXISTS (SELECT * FROM sys.indexes
WHERE object_id = OBJECT_ID(N'''
+ RC.Constraint_Schema + '.' + ccu.Table_Name
+ ''') AND name = N''IX_' + ccu.Table_Name + '_'
+ ccu.Column_Name + ''') '
+ 'CREATE NONCLUSTERED INDEX IX_' + ccu.Table_Name + '_'
+ ccu.Column_Name + ' ON ' + rc.Constraint_Schema + '.'
+ ccu.Table_Name + '( ' + ccu.Column_Name
+ ' ASC ) WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF, ONLINE = ON);'
ELSE ''
END AS SQL
FROM information_schema.referential_constraints RC
JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu
ON rc.CONSTRAINT_NAME = ccu.CONSTRAINT_NAME
JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu2
ON rc.UNIQUE_CONSTRAINT_NAME = ccu2.CONSTRAINT_NAME
LEFT JOIN sys.columns c ON ccu.Column_Name = C.name
AND ccu.Table_Name = OBJECT_NAME(C.OBJECT_ID)
LEFT JOIN sys.index_columns ic ON C.OBJECT_ID = IC.OBJECT_ID
AND c.column_id = ic.column_id
AND index_column_id = 1
-- index found has the foreign key
-- as the first column
LEFT JOIN sys.indexes i ON IC.OBJECT_ID = i.OBJECT_ID
AND ic.index_Id = i.index_Id
WHERE I.name IS NULL
ORDER BY FK_table ,
ParentTable ,
ParentColumn;
GO
Dependencies
It depends ... I'm sure you have heard this expression before. I will consider three different methods for “reverse engineering” dependencies in the database. The first method is to use the sp_msdependecies stored procedure. The second is system tables associated with foreign keys. The third method is to use CTE.
sp_msdependencies
Sp_msdependencies is an undocumented stored procedure that can be very useful for parsing complex table dependencies.
EXEC sp_msdependencies '?' -- Displays Help
sp_MSobject_dependencies name = NULL, type = NULL, flags = 0x01fd
name: name or null (all objects of type)
type: type number (see below) or null
if both null, get all objects in database
flags is a bitmask of the following values:
0x10000 = return multiple parent/child rows per object
0x20000 = descending return order
0x40000 = return children instead of parents
0x80000 = Include input object in output result set
0x100000 = return only firstlevel (immediate) parents/children
0x200000 = return only DRI dependencies
power(2, object type number(s)) to return in results set:
0 (1 - 0x0001) - UDF
1 (2 - 0x0002) - system tables or MS-internal objects
2 (4 - 0x0004) - view
3 (8 - 0x0008) - user table
4 (16 - 0x0010) - procedure
5 (32 - 0x0020) - log
6 (64 - 0x0040) - default
7 (128 - 0x0080) - rule
8 (256 - 0x0100) - trigger
12 (1024 - 0x0400) - uddt
shortcuts:
29 (0x011c) - trig, view, user table, procedure
448 (0x00c1) - rule, default, datatype
4606 (0x11fd) - all but systables/objects
4607 (0x11ff) – all
If we derive all the dependencies using sp_msdependencies, we get four columns: Type, ObjName, Owner (Schema), Sequence.
Pay attention to the sequence number (Sequence) - it starts from 1 and sequentially increases. Sequence is the "serial number" of a dependency.
I used this method several times when I needed to perform archiving or deletion on a very large database. If you know the dependencies of the table, then you have a “road map” - in what order do you need to archive or delete data. Start with the table with the highest value in the Sequence column and move from it in the reverse order - from larger to smaller. Tables with the same Sequence value can be deleted at the same time. This method does not violate any of the restrictions of foreign keys and allows you to transfer / delete records without temporarily deleting and rebuilding constraints.
EXEC sp_msdependencies NULL -- Все зависимости в БД
EXEC sp_msdependencies NULL, 3 -- Зависимости определённой таблицы
In SSMS, if you right-click on the table name, you can select “View Dependencies” and “Objects that depend on TABLENAME”:
We can also get this information in the following way:
-- sp_MSdependencies — Только верхний уровень
-- Объекты, которые зависят от указанного объекта
EXEC sp_msdependencies N'Sales.Customer',null, 1315327 -- Change Table Name
If in SSMS, in the window for viewing dependencies, select "Objects that depend on TABLENAME" and then expand all levels, we will see the following:
sp_msdependencies will return the same information.
-- sp_MSdependencies - Все уровни
-- Объекты, которые зависят от указанного объекта
EXEC sp_MSdependencies N'Sales.Customer', NULL, 266751 -- Change Table Name
Also, in SSMS, we can see which objects the selected table depends on.
The following query, using msdependencies, will return the same information.
-- Объекты, от которых зависит указанный объект
EXEC sp_MSdependencies N'Sales.Customer', null, 1053183 -- Change Table
If you want a list of table dependencies, you can use a temporary table to filter the dependencies by type.
CREATE TABLE #TempTable1
(
Type INT ,
ObjName VARCHAR(256) ,
Owner VARCHAR(25) ,
Sequence INT
);
INSERT INTO #TempTable1
EXEC sp_MSdependencies NULL
SELECT *
FROM #TempTable1
WHERE Type = 8 --Tables
ORDER BY Sequence ,
ObjName
DROP TABLE #TempTable1;
Requests to system catalog views
The second method of "reverse engineering" dependencies in your database is to query the directory system representations associated with foreign keys.
--Independent tables
SELECT Name AS InDependentTables
FROM sys.tables
WHERE object_id NOT IN ( SELECT referenced_object_id
FROM sys.foreign_key_columns )
-- Check for parents
AND object_id NOT IN ( SELECT parent_object_id
FROM sys.foreign_key_columns )
-- Check for Dependents
ORDER BY Name
-- Tables with dependencies.
SELECT DISTINCT
OBJECT_NAME(referenced_object_id) AS ParentTable ,
OBJECT_NAME(parent_object_id) AS DependentTable ,
OBJECT_NAME(constraint_object_id) AS ForeignKeyName
FROM sys.foreign_key_columns
ORDER BY ParentTable ,
DependentTable
-- Top level of the pyramid tables. Tables with no parents.
SELECT DISTINCT
OBJECT_NAME(referenced_object_id) AS TablesWithNoParent
FROM sys.foreign_key_columns
WHERE referenced_object_id NOT IN ( SELECT parent_object_id
FROM sys.foreign_key_columns )
ORDER BY 1
-- Bottom level of the pyramid tables.
-- Tables with no dependents. (These are the leaves on a tree.)
SELECT DISTINCT
OBJECT_NAME(parent_object_id) AS TablesWithNoDependents
FROM sys.foreign_key_columns
WHERE parent_object_id NOT IN ( SELECT referenced_object_id
FROM sys.foreign_key_columns )
ORDER BY 1
-- Tables with both parents and dependents.
-- Tables in the middle of the hierarchy
SELECT DISTINCT
OBJECT_NAME(referenced_object_id) AS MiddleTables
FROM sys.foreign_key_columns
WHERE referenced_object_id IN ( SELECT parent_object_id
FROM sys.foreign_key_columns )
AND parent_object_id NOT IN ( SELECT referenced_object_id
FROM sys.foreign_key_columns )
ORDER BY 1;
-- in rare cases, you might find a self-referencing dependent table.
-- Recursive (self) referencing table dependencies.
SELECT DISTINCT
OBJECT_NAME(referenced_object_id) AS ParentTable ,
OBJECT_NAME(parent_object_id) AS ChildTable ,
OBJECT_NAME(constraint_object_id) AS ForeignKeyName
FROM sys.foreign_key_columns
WHERE referenced_object_id = parent_object_id
ORDER BY 1 ,
2;
Using CTE
The third method to get the dependency hierarchy is to use a recursive CTE.
-- How to find the hierarchical dependencies
-- Solve recursive queries using Common Table Expressions (CTE)
WITH TableHierarchy ( ParentTable, DependentTable, Level )
AS (
-- Anchor member definition (First level group to start the process)
SELECT DISTINCT
CAST(NULL AS INT) AS ParentTable ,
e.referenced_object_id AS DependentTable ,
0 AS Level
FROM sys.foreign_key_columns AS e
WHERE e.referenced_object_id NOT IN (
SELECT parent_object_id
FROM sys.foreign_key_columns )
-- Add filter dependents of only one parent table
-- AND Object_Name(e.referenced_object_id) = 'User'
UNION ALL
-- Recursive member definition (Find all the layers of dependents)
SELECT --Distinct
e.referenced_object_id AS ParentTable ,
e.parent_object_id AS DependentTable ,
Level + 1
FROM sys.foreign_key_columns AS e
INNER JOIN TableHierarchy AS d
ON ( e.referenced_object_id ) =
d.DependentTable
)
-- Statement that executes the CTE
SELECT DISTINCT
OBJECT_NAME(ParentTable) AS ParentTable ,
OBJECT_NAME(DependentTable) AS DependentTable ,
Level
FROM TableHierarchy
ORDER BY Level ,
ParentTable ,
DependentTable;
Conclusion
Thus, in an hour or two, you can get a good idea of the insides of any database using the "reverse engineering" methods described above.Translator's note : all queries in the text (with the exception of one, it is marked in the text) will work on SQL Server 2005 SP3 and in later editions. The text is quite voluminous, I tried as much as I could to subtract it and find my mistakes (stylistic, syntactic, semantic and others), but I probably didn’t notice something, write me a PM, please, if something hurts the eye.