How I Improved SSRS Reporting Performance
Background
Being “under-middle” in a Kharkov office, I suddenly received an invitation to LinkedIn to move to Poland and start working for one big bank. After very much thought, I agreed, motivating myself with the hope that it would be a very useful experience. Working for a large corporation is not only a programming experience, but also two, atoms and three years of useful notes for a resume, as well as +100 wearing a suit. (No matter how, but more about that another time). It turned out that the level of govnokod in Poland is much higher than that in Ukraine, and the average level of Senior developer, at least of those whom I met here, can not be compared with Ukrainian colleagues.
So, having come to a new job, after three months of organizational routine and searching for a project for me, I got myself a tool for myself, which generates SSRS reports. To say that they laid here from a high bell tower on any principles and advice from leading world engineers means nothing to say. But you need to do something - so there’s no one to blame.
Given:
.Net Console Application, which
- Runs according to the standard Windows scheduler;
- downloads two txt files from ftp;
- makes a structural analysis;
- connects data;
- crams into the database;
- as a result, one table per 100 row (mainly) columns, into which about 3000 entries are added every day, which do not even have id IDENTITY.
SQL Server, which stores the database in which
- fucking healthy table, with a bunch of columns and rows;
- normalization has died;
- there is a couple view'yuh;
- there are accompanying tables, like information about the user, logs and so on;
- a very important detail - in a large table were stored two types of data that contained the same columns, but had to be matched to each other. For example, information about the transaction on the part of the buyer and the house being bought, street price and everything else is the same, only a few fields made up the key by which the house and the buyer could be related to each other. This was the main task of the report.
SSRS reports that
- Climb directly into the database with queries that select data from view, on certain days. You can generate both a daily report and a month. For instance:
SELECT a.col1, a.col2 .... a.col100 FROM vw_Data_With_User_Info WHERE a.TimeStamp >= @StartDate AND a.TimeStamp <=@EndDate
Additional terms:
- There is no access to UAT;
- In the DEV database, monthly data, test, no access to real data;
- No rights to Profiler.
Task
- The report for one day is downloaded about 3 minutes, it takes about 40 seconds;
- A monthly report is downloaded for about an hour, or not loaded at all - just fix it if possible.
Solutions:
It should be noted that I haven’t even read about SSRS before, so maybe some really good programmers already see why it works so badly, but it took me about a month to solve it.
Attempt 1
The first, most logical solution was to reduce the number of columns in the report. After all, no one can normally view data from 100 columns to 3000 records, albeit grouped by company and user. We talked with the customer, found out the most necessary columns, and it turned out that it was completely painless to reduce their number to 16. So the Lite report was born.
The result of the attempt 1. Report_Lite for one day - 40 seconds - 1 min 20 seconds depending on the mood of the server and the position of the stars. For a month, half an hour - an hour. At this, customers calmed down for a while contented with the result.
Attempt 2
Over the next month, I gradually tried to reduce the load time of the report, putting things in order in the view's and queries. For example, all the calculations, and with them the logic of matching the client and the house to each other, in a console application, because so far no one has presented requirements for its speed of work. After that, a couple of columns were added to the table, among which was MatchingKey, and the speed of reporting generation on UAT fell sharply. It should also be noted that during this month the generation rate slowly fell from 1 minute (on average) to 1.30. Then I was confused by the fact of adding a new column. I looked that for the records in the table before my release, MatchingKey is NULL, which naturally made the request sort through almost a million records and not find any matches. Plus, I noticed that at first the view is executed without a date and time limit, and only then time data is selected from it. By my logic, this should have significantly improved the performance of report generation and, by the way, it worked more or less on the DEV server, invested in 40 seconds, but it did not affect the UAT in any way.
The result of the attempt 2. Almost invisible.
Attempt 3 (success and universal recognition)
After I disappointed my superiors with an attempt of two, I received a specific instruction. Drop everything and do only performance. The first idea was FULL PROCESSING OF THE APPLICATION. I wanted to split a large table into two as files and reduce them to the third by id, completely transfer any calculations to the .net upliche. Remove as many groupings from reports as possible and, in general, do everything as the Macaroni God commands. I spent a week on this, but when I got to the point of saving data to the database, I started experimenting with tables and queries and it turned out that there was practically no difference in the speed of query execution with the choice of view, join of two separate tables. At least not such that the report loaded 1 minute. For 3000 records, the query in SQL took 3-5 seconds. So, the point is to group the data on the client side. We remove all the groups a clean report even without sorting - minus a couple of seconds to generate. It turns out that grouping takes a couple of seconds, SQL query 3-5. And where does almost a minute go?
The question is not rhetorical, and for good, such experiments had to be carried out before attempt 1, but when it came to that, then it did. We continue the experiment. We begin to play with requests from the report. By that time, I had already asked the admins for access to Profiler. I opened Profiler, but since I had no special (any) experience with it, I did not look very carefully. And here at some point I was tired of entering @StartDate and @EndDate every time, it is annoying when you need to click two calendars every half minute. Therefore, in a report without grouping, I went into the query generator and wrote '2014-06-11' instead of @StartDate and @EndDate. Clap, 3 seconds to generate! How so? There was probably caching, cleared the cache everywhere, again. 3 seconds Can not be. Now with a grouping - 5 seconds. Yes you are kidding.
After half an hour of scratching turnips and digging at Profiler for a while, I finally found the thing that tortured me for a month, and this poor project for half a year.
Result
It turned out that having a query in a report, for example:
SELECT * FROM Table WHERE time >= @StartDate AND time <= @EndDate
On the SQL side, it will be executed as follows:
EXEC sp_executesql N'SELECT * FROM Table WHERE time >= @StartDate AND time <= @EndDate','@StartDate datetime, @EndDate datetime','2014-06-11','2014-06-11'
And now such a request is executed about 1.30 seconds. Whereas when I substituted the values into the query right away, it was not DSQL that was executed, but a regular query.
As a result, I was able to convince the client to switch to Stored Procedures and now the daily report is loading for 6 seconds, the monthly report is 1 minute.
conclusions
- 1. hurry people make fun
- 2. Hurry, lose a month
- 3. Hurry your bosses angry
- 4. It is better to spend time understanding the problem, rather than expecting possible solutions, especially in the topic where you are new to.
PS I understand that this article is rather not about solving the problem, but about how it was wrong to solve it. But I googled a lot about this and did not find such advice, or clues that the reports could make such a joke.
PS2 I have a couple of suggestions as to why such a request has been taking so long, but I will be very grateful if there is someone who knows exactly why - and will explain to me.
UPD1: I took into account the complaints about this article and tried to correct all the words that caused criticism.
UPD2: Today, in a new report, having already converted it to a stored procedure, I still ran into a problem.
exec rOtex_Mis_DailyHighLevelReport @StartDate='2014-06-01 00:00:00',@EndDate='2014-06-30 00:00:00'
during the generation of the report, it took 6 minutes, and with hands 1 second. Therefore, I got into Google with a specific question, “SSRS stored procedure slow”. And he received an intelligible answer. Woe to me Google-handshake, he was lying on the surface of the CORRECT search.
stackoverflow.com/questions/2283943/fast-query-runs-slow-in-ssrs
The fact is that this is a well-known SQL Servera problem that relates to Parameter Sniffing, as BelAnt and microuser comrades rightly advised me - and the solution is rather shitty and , as is often the case with govnokodom, simple. I will give an example from my stored procedure:
@StartDate Date,
@EndDate Date
AS
BEGIN
declare @Start datetime
declare @End datetime
set @Start = @StartDate
set @End = @EndDate
And it works:
Conclusion 2.0
- Google is fine, EPD3
: After several tests, it turned out that the stored procedures were also invented from a "big mind." It turned out that it was enough to add OPTIONS (RECOMPILE) to the end of the request, as Comrade BelAnt advised. I won’t write conclusions here, if it’s won’t be without a brain, then conclusions before the asshole :(