WINGUNT January 4, 2013 at 16:09

Optimizing SQL Server Performance Using Indexes

From the sandbox

Introduction

As you know, indexes increase productivity similar to a table of contents or index in a cngie. After reading several articles on the Internet and a couple of chapters from books, I would like to know how indexes help increase the speed of data retrieval from SQL Server. Let's look at an example.
To do this, we need two tables connected by a foreign key. The main table will contain 10,000 rows, the second one will contain 1,000,000. The first table will contain a primary key and 2 columns with text, the second table will contain a primary key, 2 text fields, 2 numeric fields, a date, a calculated column, and a foreign key.

Database structure

DB structure

CREATE TABLE "maintable"(
"ID" INT NOT NULL IDENTITY (1, 1) PRIMARY KEY,
"name" NVARCHAR (50) NULL,
"description" NVARCHAR (100) NULL
)
CREATE TABLE "secondtable" (
"ID" INT NOT NULL IDENTITY (1, 1) PRIMARY KEY,
"sname" NVARCHAR (50) NULL,
"sdescr" NVARCHAR (100) NULL,
"somefirstnumber" INT DEFAULT 0,
"somesecondnumber" INT DEFAULT 1,
"somedatetime" DATETIME NOT NULL,
"howmanyhours" INT NULL DEFAULT 0,
"newdate" AS DATEADD(HOUR, "howmanyhours", "somedatetime"),
"mainID" INT NOT NULL
CONSTRAINT FK_SecondToMain FOREIGN KEY ("mainID")
REFERENCES "maintable" ("ID")
);
--Хранимые процедуры
CREATE PROCEDURE "insertintomain"
@Name NVARCHAR (50), @Descr NVARCHAR (100)
AS
	INSERT INTO "SimpleIndex"."dbo"."maintable" ("name", "description")
	VALUES (@Name, @Descr);
GO
CREATE PROCEDURE "insertintosecond"
@Sname NVARCHAR(50), @Sdescr NVARCHAR(100),
@Firstnumber INT, @Secondnumber INT,
@SomeDT DATETIME, @Hours INT, @MainID INT
AS
	INSERT INTO "SimpleIndex"."dbo"."secondtable" (
	"sname", "sdescr", "somefirstnumber", 
	"somesecondnumber", "somedatetime", 
	"howmanyhours", "mainID")
	VALUES (
	@Sname, @Sdescr,@Firstnumber, @Secondnumber,
	@SomeDT, @Hours, @MainID)
GO

Data insertion

It’s clear that you just don’t insert a million lines, we’ll use C #.

Let's create a simple console application

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data;
using System.Data.SqlClient;
namespace SimpleAddingRandomRecordsToDatabase
{
    class Program
    {
        static string alphabet()
        {
            //Количество пробелов означает частоту их появляния
            string alphabet = "abcdefgh ijklmnopqrst uvwxyzабвгд еёжзийкл мнопросту фхцчшщыэюя ";
            //преобразуем алфавит в юникод
            Encoding enc = Encoding.UTF8;
            var bytes = enc.GetBytes(alphabet);
            alphabet = enc.GetString(bytes);
            return alphabet;
        }
        static int alphlen = alphabet().Length;
        static string getRandomString(int lenth, int rndparam, bool allowspacebars)
        {
            string str = "";
            Random rand = new Random(rndparam);
            for (int i = 0; i < lenth; i++)
            {
                int num = rand.Next(alphlen);
                str = str + alphabet()[num];
            }
            if (!allowspacebars)
            {
                str = str.Replace(" ", "");
            }
            return str;
        }
        static void Main(string[] args)
        {
            string result;
            Console.WriteLine("Нажмите 1 для начала операции");
            result = Console.ReadLine();
            if (result.Equals("1"))
            {
                //Подключаемся к БД
                SqlConnection cn = new SqlConnection();
                SqlConnectionStringBuilder cs = new SqlConnectionStringBuilder();
                cs.IntegratedSecurity = true;
                cs.DataSource = @"KONSTANTIN\KONSTANTINSQL";
                cs.InitialCatalog = "SimpleIndex";
                cn.ConnectionString = cs.ToString();
                cn.Open();
                //генерируем первые 10 000 записей для главной таблицы
                int i2 = 0;
                for (int i = 0; i < 10000; i++)
                {
                    SqlCommand cm = cn.CreateCommand();
                    cm.CommandType = CommandType.StoredProcedure;
                    cm.CommandText = "insertintomain";
                    i2++;
                    cm.Parameters.Add(new SqlParameter("Name", getRandomString(50, i2, false)));
                    i2++;
                    cm.Parameters.Add(new SqlParameter("Descr", getRandomString(100, i2, true)));
                    cm.ExecuteNonQuery();
                }
                //Генерируем 1 000 000 записей, соедниённых с первой таблицей
                i2 = 0;
                for (int i = 0; i < 1000000; i++)
                {
                    Random rand = new Random();
                    DateTime dt = DateTime.Now;
                    dt = dt.AddHours((-1) * rand.Next(2000));
                    SqlCommand cm = cn.CreateCommand();
                    cm.CommandType = CommandType.StoredProcedure;
                    cm.CommandText = "insertintosecond";
                    i2++;
                    cm.Parameters.Add(new SqlParameter("Sname", getRandomString(50, i2, false)));
                    i2++;
                    cm.Parameters.Add(new SqlParameter("Sdescr", getRandomString(100, i2, true)));
                    i2++;
                    Random rand2 = new Random(i2);
                    cm.Parameters.Add(new SqlParameter("Firstnumber", rand2.Next(10000)));
                    i2++;
                    rand2 = new Random(i2);
                    cm.Parameters.Add(new SqlParameter("Secondnumber", rand2.Next(100)));
                    cm.Parameters.Add(new SqlParameter("SomeDT", dt));
                    i2++;
                    rand = new Random(i2);
                    cm.Parameters.Add(new SqlParameter("Hours", rand.Next(30)));
                    i2++;
                    rand = new Random(i2);
                    cm.Parameters.Add(new SqlParameter("MainID", rand.Next(9999)+1));
                    cm.ExecuteNonQuery();
                }
                cn.Close();
                Console.WriteLine("Данные вставлены.");
            }
        }
    }
}

We launch the console application, press "1" and wait. It took me about 15-20 minutes to insert.

Description of the results

In this section, we will examine query queries in indexed and non-indexed columns.
One clustered index is created implicitly - this is the primary key.

Primary Key Search

Perform a row search by the primary key, indicating a specific row number of ten thousand:

SELECT "name", "description" FROM "maintable" WHERE "ID" = 3823;

Search time: 1.523 seconds .
The second request took 0.9 seconds , and the third took 0.1 seconds .

We execute the second query with great selectivity from the same table:

SELECT "name", "description" FROM "maintable" WHERE "ID" BETWEEN 5000 AND 6000;

The first request took 0.3 seconds , the second 0.26 seconds , the third 0.25 seconds .

Now we execute similar queries to a table containing a million records. For a specific line number:

SELECT * FROM "secondtable" WHERE "ID" = 728472;

Search time: 1.4 seconds , the second query showed 1.36 seconds. . Next, the average query time will be shown.

The primary key columns already have an index, so it is most likely impossible to improve performance. Based on the results, it can be said that repeated queries with the same parameters increase the search speed.

Pattern Search

Consider text search by template with indexed and non-indexed columns:

SELECT * FROM "maintable" WHERE "name" LIKE 'мл%';

Without an index, such a query is executed in 0.172 seconds out of 10,000 rows. With index

CREATE NONCLUSTERED INDEX iName
ON "maintable" ("name")

this is done in 0.112 seconds , and with a composite index containing the "name" and "description" fields, the query took 0.09 seconds .

Inquiry

SELECT * FROM "maintable" WHERE "name" LIKE '%dc%';

runs in 0.172 seconds without an index and 0.112 seconds with an index .

Perform a template search for a table containing a million records.

SELECT * FROM "secondtable" WHERE "sname" LIKE 'юуж%';

This query without indexing would take an average of 1.7 seconds , while 0.15 seconds would be spent with the column index .

Date Search

First, we will perform a low selectivity request

SELECT * FROM "secondtable" 
WHERE "somedatetime" BETWEEN '2012-26-11 11:30:00' AND '2012-26-11 11:32:00'

Without an index, it will take an average of 0.7 seconds , indexed - 0.22 seconds .

High selectivity query (420,600 rows)

SELECT * FROM "secondtable" 
WHERE "somedatetime" BETWEEN '2012-26-11 11:30:00' AND '2012-31-12 11:32:00'

Without an index - 19.219 seconds , with an index only date fields - 9.9 seconds .

Query by date for calculated column

SELECT * FROM "secondtable" 
WHERE "newdate" BETWEEN '2012-26-11 11:30:00' AND '2012-31-12 11:32:00'

This query without an index will take 12.2 seconds , while with an index - 9.75 . It is important to consider that the original column containing the date has already been indexed.

Table join

When joining tables, it is always advised to index a column, which is a foreign key. In the following example, we will see that this is actually the case.

CREATE NONCLUSTERED INDEX iFK ON "secondtable" ("mainID");
SELECT "name", "sname", "somefirstnumber" 
FROM "maintable" M
INNER JOIN "secondtable" S
	ON M."ID" = S."mainID" 
WHERE
	M."ID" = 8271

Such a query will return 101 lines. Without an index - 0.422 sec , and with an index - 0.122 sec . As we can see, the query speed has increased more than three times. Consider the remaining queries related to the union.

But the index does not always increase performance. For example, a query (420,000 rows)

SELECT "name", "sname", "somefirstnumber" 
FROM "maintable" M
INNER JOIN "secondtable" S
	ON M."ID" = S."mainID" 
WHERE
	S."somedatetime" BETWEEN '2012-26-11 11:30:00' AND '2012-31-12 11:32:00';

showed the same results without an index, with the index of only the foreign key, foreign key and date, with a composite index that simultaneously includes these two columns - about 6.8 seconds .

Combine with pattern matching

Let's perform a combination with a template search among a million records:

SELECT "name", "sname", "somefirstnumber" 
FROM "maintable" M
INNER JOIN "secondtable" S
	ON M."ID" = S."mainID" 
WHERE
	"sname" LIKE 'юуж%';

Without indexes, the query takes 0.766 seconds and returns 5 rows.
With an index of only a foreign key - 0,4 sec. ;
With an index of only a text field - 0.125 sec. ;
With foreign key index and text field index - 0.109 sec. ;
With a composite index containing a foreign key and a text field - 0.35 sec. .

Comparison of Results

Consider the results in the form of graphs. The vertical axis shows the elapsed time ( not speed! ).
1. Sampling by the primary key. Since there is already a clustered index, we consider a large and a small selective sample from one table:

From this graph it can be seen that a large amount of data is processed faster.

2. Search by primary key with table joining. The following graph shows that indexing a foreign key speeds up the search operation by more than three times:

3. Search by text template. The graph shows that productivity grows many times if you use the index with a large amount of data:

4. Search by date. Indexes significantly increase performance both for a small sample and a large one within a single table:

Combining tables and searching by date showed the same results with and without indexes.

5. Search by calculated column. The index also reduces search time.

6. Search by template with union. The text field index and both indexes significantly increase performance:

To summarize

After analyzing the results, we can say that:

It is desirable to index foreign keys; this threatens a significant increase in performance;
Separately indexed columns of text, date, numbers increase the speed of queries;
Indexes work better with tables containing a large number of rows;
Indexes work better when fetching a large number of rows;
Indexes do not work well with computed columns;
In combining queries, it is necessary to index the foreign key and the column by which the search will occur;
An index will not help if the WHERE clause contains search terms for two tables;
Indexes do not affect the speed of data insertion.

So, we see that indexing significantly improves SELECT queries, however indexes take up additional memory and must be rebuilt periodically. The optimal solution is difficult to find, so developers and database administrators spend a lot of time looking for a middle ground, and indexes need to be reorganized. But still, if you have a lot of data, then the answer, whether to use indexes or not, is obvious.

Tags: