
7 things a developer should know about SQL Server
- Transfer
Hey. I am a former developer who became a database administrator, and below I wrote about what, in due time, I would like to hear myself.
Good developers like to reuse code by putting it in functions and calling these functions from different places. This works great at the application level, but at the database level it can lead to huge performance issues.
Check out this post on the forced use of concurrency — specifically, a list of what leads to the generation of a “single-threaded” query execution plan. Most likely, the use of scalar UDFs ( translator's note: for servers younger than 2008 R2 and not only scalar ones ) will result in your request being executed in one thread ( * sad sighs * ).
If you want your code to be reused, think about stored procedures and views. (In fact, they can introduce their performance problems, but I just want to get you on the right track as quickly as possible, and UDF, alas, is not.)
At one stage in your development career, you can start using the WITH (NOLOCK) hint everywhere, because with it your queries are faster. This is not always bad, but may be accompanied by unexpected side effects that Kendra Little talked about in this video . I will focus only on one of them.
When your query accesses a table, even with the NOLOCK hint, you impose a schema stability lock (Sch-S). No one can change this table or its indexes until your query completes. This does not seem to be a serious problem until you need to delete the index, but you cannot do it, because people constantly work with this table, being completely sure that they do not create any problems, since they use the WITH (NOLOCK) hint .
There is no silver bullet here, but start reading about SQL Server isolation levels - I believe that the READ COMMITTED SNAPSHOT isolation level would be the best choice for your application. You will receive complete data with fewer blocking problems.
I know that you only have one SQL Server right now, but believe me, it's worth it. Create three connection lines that will now refer to only one server, but then, when you think about scaling, you will have the opportunity to use different servers “to serve” each of these lines.
It’s quite difficult to “scale” the first connection string; SQL Server doesn’t have many options for “scaling write operations” (there are such options, but it’s very difficult to apply and manage them). Scaling the second and third lines of the connection is much easier and cheaper. To get more information about using different connection strings, you can read this post of mine .
You are probably using a database to perform some secondary tasks - computing, sorting, loading, etc. If this data suddenly disappears, you are unlikely to be very upset, but the table structure is, of course, a different matter. Now you are doing everything in the "main database" of your application.
Create a separate database, name it MyAppTemp, and do everything in it! Give her a simple recovery model and just back up once a day. Do not bother with high availability or disaster recovery of this database.
Using this technique has a bunch of advantages. It minimizes the number of changes in the main database, which means that backups of the transaction log and differential backups will be done faster. If you use log shipping, really important data will be copied faster. You can even store this database separately from other databases, for example, on an inexpensive but nimble SSD-drive, leaving the main storage system for data critical in production.
SQL Server came out more than ten years ago and over the years there have been many changes. Unfortunately, old materials are not always updated to describe the “current” changes. Even fresh material from verified sources may be wrong - for example, criticism of Microsoft's methodology for improving SQL Server performance . Microsoft Certified Master Jonathan Kehayias found a lot of really bad tips in a Microsoft document.
When you hear something that sounds like good advice, I suggest you use a strategy opposite to Dr. Phil's. Dr. Phil says you should be “inspired” by any idea for 15 minutes. Instead, try to hate it - try to refute what you read before you apply it in production. Even if the advice is damn good, it might not be very useful on your system. (Yes, this also applies to my advice).
SQL Server spends CPU time sorting the results of your query. SQL Server Enterprise Edition costs about $ 7,000 for one core - not for the processor, but for the core itself. A two-socket, six-core server will cost about $ 84,000 - and this is only the price of licenses, not counting the hardware. You can buy a hell of a lot of application servers (even with 256 GB of RAM on each) for $ 84k.
Give your results to your application as quickly as possible and sort them there. Your application server is probably designed in such a way that it can distribute the processor load across different nodes, while your database server cannot.
UPDI received many comments that the application needs, for example, only ten lines, instead of ten million lines returned by the request. Yes, of course, if you write TOP 10, you need sorting, but what about rewriting the request so that it does not return a bunch of unnecessary data? If there is so much data that the application server has to spend too many resources sorting, then SQL Server does the same job. We will talk about how to find such requests at the webinar, a link to which is at the end of the post. Also, remember that I said, “Avoid using ORDER BY,” and not “Never use ORDER BY.” I use this instruction in the same way - but if I can avoid it at a very expensive database level, I try to do it. This is what “avoid” means.
(And this is the part in which fans of MySQL and PostgreSQL talk about how to reduce the cost of licenses using open source DBMS). (And in this part, you expect me to answer witty to them, but I won’t do it. If you are developing a new application and thinking about choosing a database, read my answer to StackOverflow about which database will withstand the greatest load.)
SQL Server Dynamic Administrative Views (DMV) can show you all the places that are detrimental to performance, i.e.:
All you need to know is where to look at all this - and on Tuesday, March 5th, we will show you this. We are doing a 30-minute webcast to train developers and you can register for it here ( upd the webinar was successful, the recording can be viewed here ).
Translator's Note: Any suggestions and comments on translation and styling are welcome, as usual.
7. The performance of scalar UDF is poor
Good developers like to reuse code by putting it in functions and calling these functions from different places. This works great at the application level, but at the database level it can lead to huge performance issues.
Check out this post on the forced use of concurrency — specifically, a list of what leads to the generation of a “single-threaded” query execution plan. Most likely, the use of scalar UDFs ( translator's note: for servers younger than 2008 R2 and not only scalar ones ) will result in your request being executed in one thread ( * sad sighs * ).
If you want your code to be reused, think about stored procedures and views. (In fact, they can introduce their performance problems, but I just want to get you on the right track as quickly as possible, and UDF, alas, is not.)
6. “WITH (NOLOCK)” does not mean that there will be no locks at all
At one stage in your development career, you can start using the WITH (NOLOCK) hint everywhere, because with it your queries are faster. This is not always bad, but may be accompanied by unexpected side effects that Kendra Little talked about in this video . I will focus only on one of them.
When your query accesses a table, even with the NOLOCK hint, you impose a schema stability lock (Sch-S). No one can change this table or its indexes until your query completes. This does not seem to be a serious problem until you need to delete the index, but you cannot do it, because people constantly work with this table, being completely sure that they do not create any problems, since they use the WITH (NOLOCK) hint .
There is no silver bullet here, but start reading about SQL Server isolation levels - I believe that the READ COMMITTED SNAPSHOT isolation level would be the best choice for your application. You will receive complete data with fewer blocking problems.
5. Use three connection strings in your application
I know that you only have one SQL Server right now, but believe me, it's worth it. Create three connection lines that will now refer to only one server, but then, when you think about scaling, you will have the opportunity to use different servers “to serve” each of these lines.
- The connection string for writing and reading “in real time” is the connection string that you are using now and think that all the data should come from here. You can leave all your code as it is now, but when you add something, or change the current one, think about changing the connection string in the queries to one of the following.
- A connection string for receiving "relatively fresh" data, aged 5-15 minutes - for data that may be slightly outdated, but still current.
- The connection string for yesterday's data is for reporting and trending. For example, in the online store, with this connection string, you can pull user reviews for products, and warn users themselves that their reviews will be published the next day.
It’s quite difficult to “scale” the first connection string; SQL Server doesn’t have many options for “scaling write operations” (there are such options, but it’s very difficult to apply and manage them). Scaling the second and third lines of the connection is much easier and cheaper. To get more information about using different connection strings, you can read this post of mine .
4. Use an intermediate database
You are probably using a database to perform some secondary tasks - computing, sorting, loading, etc. If this data suddenly disappears, you are unlikely to be very upset, but the table structure is, of course, a different matter. Now you are doing everything in the "main database" of your application.
Create a separate database, name it MyAppTemp, and do everything in it! Give her a simple recovery model and just back up once a day. Do not bother with high availability or disaster recovery of this database.
Using this technique has a bunch of advantages. It minimizes the number of changes in the main database, which means that backups of the transaction log and differential backups will be done faster. If you use log shipping, really important data will be copied faster. You can even store this database separately from other databases, for example, on an inexpensive but nimble SSD-drive, leaving the main storage system for data critical in production.
3. “Yesterday's” articles and books may no longer be relevant today.
SQL Server came out more than ten years ago and over the years there have been many changes. Unfortunately, old materials are not always updated to describe the “current” changes. Even fresh material from verified sources may be wrong - for example, criticism of Microsoft's methodology for improving SQL Server performance . Microsoft Certified Master Jonathan Kehayias found a lot of really bad tips in a Microsoft document.
When you hear something that sounds like good advice, I suggest you use a strategy opposite to Dr. Phil's. Dr. Phil says you should be “inspired” by any idea for 15 minutes. Instead, try to hate it - try to refute what you read before you apply it in production. Even if the advice is damn good, it might not be very useful on your system. (Yes, this also applies to my advice).
2. Avoid using ORDER BY; sort data in application
SQL Server spends CPU time sorting the results of your query. SQL Server Enterprise Edition costs about $ 7,000 for one core - not for the processor, but for the core itself. A two-socket, six-core server will cost about $ 84,000 - and this is only the price of licenses, not counting the hardware. You can buy a hell of a lot of application servers (even with 256 GB of RAM on each) for $ 84k.
Give your results to your application as quickly as possible and sort them there. Your application server is probably designed in such a way that it can distribute the processor load across different nodes, while your database server cannot.
UPDI received many comments that the application needs, for example, only ten lines, instead of ten million lines returned by the request. Yes, of course, if you write TOP 10, you need sorting, but what about rewriting the request so that it does not return a bunch of unnecessary data? If there is so much data that the application server has to spend too many resources sorting, then SQL Server does the same job. We will talk about how to find such requests at the webinar, a link to which is at the end of the post. Also, remember that I said, “Avoid using ORDER BY,” and not “Never use ORDER BY.” I use this instruction in the same way - but if I can avoid it at a very expensive database level, I try to do it. This is what “avoid” means.
(And this is the part in which fans of MySQL and PostgreSQL talk about how to reduce the cost of licenses using open source DBMS). (And in this part, you expect me to answer witty to them, but I won’t do it. If you are developing a new application and thinking about choosing a database, read my answer to StackOverflow about which database will withstand the greatest load.)
1. SQL Server has built-in tools for finding bottlenecks that do not affect performance
SQL Server Dynamic Administrative Views (DMV) can show you all the places that are detrimental to performance, i.e.:
- which requests generate the most load on your server
- which indexes just take up space and slow down insert / delete / update operations
- What bottlenecks are there on your server (CPU, disk, network, locks, etc.)?
All you need to know is where to look at all this - and on Tuesday, March 5th, we will show you this. We are doing a 30-minute webcast to train developers and you can register for it here ( upd the webinar was successful, the recording can be viewed here ).
Translator's Note: Any suggestions and comments on translation and styling are welcome, as usual.