Maysoft July 27, 2016 at 15:11

The method of finding the causes of poor server performance 1s

Recently I encountered an unusual case, the 1c server was disgusting for the customer, so that it was clear what was at stake, I will give an example - it could take ten minutes to start a thick client. When measured by the Gilev test , the result was lower than the worst. Having looked at the upcoming measurements from other users, I realized that this is not the only case.
It’s not about optimization, when you need to increase productivity by 10-20%, it’s about finding the causes of low productivity, and its elimination. Agree, these are several different things. On the Internet, many articles are just about improving productivity, which are limited only by setting up a 1c server and / or setting up a database server. But I have not seen articles dealing with cases of low productivity, especially if there are several reasons, and these reasons are at different levels.
Typically, administrators rush to watch the monitoring results. The case that I encountered showed almost zero processor load, the presence of free RAM, the absence of a queue at the network interface, and only the queue to the disk showed that not everything was in order. I had to arrange a check according to the full program, this, of course, takes a lot of time, requires the exclusion of the server from the workflow, but it gives the result. Perhaps for some people this approach is unacceptable, moreover, some consider this to be an unprofessional approach, but I can not help them with anything.

Hardware level

It sounds corny, but it is with a test of the health of iron that it is worth starting. The fact is that you can only guess about problems with the equipment, if you look at the level of the operating system. In my case, one of the disks in the disk array did not work. Oddly enough, the hard drive turned out to be working and, after putting it in place, it worked, though I had to wait a while until all the data was synchronized (see, it was turned off a long time ago). If all this ended, then there would not have been this article. Just in case, the server underwent hardware testing (stress tests, memory test, physical verification of disks and controllers), which did not reveal any problems.

Operating system level

The second point of our program was to check and configure the operating system, the essence of which is as follows:

tidy up the file system;
disable unnecessary services, remove unnecessary and, most importantly, malicious programs;
check the optimality of the operating system settings.

By tidying up the file system we mean the most obvious operations, which, oddly enough, are considered by many administrators to be inapplicable to server operating systems. This is about:

checking the logical structure of the disk;
deleting temporary and unnecessary files;
file system defragmentation.

In fairness, it should be noted that for SSD disks defragmentation really does not give anything, but only increases the number of write cycles. In my case, after putting the file system in order, the server came to life a little, but this was still not enough.
I think there is no need to explain why you need anti-virus scanning and disabling unused services, but you should not neglect this. Look, maybe some programs were installed that are no longer needed on this server. Well, make an update to the system and programs.
As for the optimality of the operating system settings, in my case an economical power mode was set. After enabling the maximum performance mode, the Gilev test showed satisfactory results, but that particular server should have shown better results.
To find out the reasons, monitoring of the use of resources was carried out, although from the very beginning it was clear that it was necessary to look for those processes that occupy a lot of the disk subsystem. In my case, the best indicator was "Queue length to disk." Let me remind you that the rest of the indicators were normal, of course, they slightly changed compared to the initial ones, but in general, the indicator remained the length of the queue to disk. The monitoring results were obvious: the processes of the 1C server and the database server turned out to be “resource thieves”.

Service level

In my case, the server 1c was located on the same machine with the MS SQL database server, but the hardware configuration of the server fully ensured their joint operation, but the settings of these two services were not at all optimal. A lot of articles are devoted to these settings, for example, this one , here we will focus only on those that do not require additional investments, for example, the purchase of a hard disk.
For the MS SQL server in each database, the values of the database auto-expansion parameter were increased to 500 MB, since 1C databases are fast-growing. A daily maintenance plan was also set up, in which, in addition to creating a database dump, statistics were updated, the procedural cache was cleared, and index defragmentation was added. In my case, this markedly reduced the number of write operations. As additional measures, it is recommended to defragment the database and reorganize indexes on a weekly basis.
For the 1C server, the parameters “Number of IS per process” and “Number of connections per process” of the working server were changed, the first got the value 1, and the second 25. Similar tips are more like “dancing with a tambourine”, but they give the result. In the case under consideration, a change in these parameters led to a significant decrease in the read / write operations on the server, and it worked in the expected mode. The Gilev test also confirmed the increase in productivity.

Base level

Having made measurements under the workload and after users left, I came across a strange result - under the load the Gilev test showed better results than with the simple one! A huge number of background tasks performed on test bases was also noticed. Test bases were used by system administrators for various test tasks. I asked to remove them - and everything fell into place. It is up to you to decide whether to keep test databases on a production server, but it is better to find some other solution for this, for example, use the file version.
One of the databases could not reduce the transaction log, while the other did not recreate indexes. For both cases, there is one simple and effective solution. Before describing it, it should be clarified that there is the same name for different objects: 1C bases and MS SQL bases, the former may not be MS SQL bases, but, for example, PostgreSQL databases. In turn, the latter will not necessarily be bases for 1C. Based on this, backup copies of 1C databases (dt file) can be deployed to other DBMSs, but no one forbids you to deploy MS SQL from the same copy. We remove the backup copies of 1C databases using 1C, and then delete the 1C database from the 1C server, and then create them again, fill the contents of the dt file.
Having put all the databases in order, I had nothing to complain about: the server was running smoothly, the disk subsystem was working normally, users were happy with fast work in 1s, administrators were amazed how fast updates were now taking place.

Conclusion

If you use only one level to search for causes of low productivity, then you can ignore the reasons that lie at other levels, that is, the result will not be achieved. The given example clearly shows that there can be several reasons, and each of them can lie at its own level. I hope that this material will help someone to overcome the problem of low performance 1C-server.

Tags: