Pulls - pulls, cannot pull ...

I, as a person on duty related to the development of a new system, are often asked the same question: "How many users does the system draw?" A very uncomfortable question, isn't it? At first I always want to practice wit, and turn on the “harmful admin” behavior model: ask a few counter questions that will save me from having to think on this difficult but interesting topic for a while:
• What configuration of iron?
• How much time should “pull”?
• What is the initial amount of data?
Well, a control shot: what does “pull” mean?
But if you want, you don’t want, but you have to answer. About one difficult search for the answer to this question, my following post.

What are we pulling?


Our experimental is a document management system EVFRAT E1. A completely new system, written almost from scratch over the past 2-3 years in the bowels of Cognitive Technologies LLC.
The system has a three-tier architecture: a database (MS SQL), an IIS-based application server plus thick and mobile clients. The main development platform is .NET 3.5, the main language is C #. As often happens, during the release process, some attention was paid to load tests, but not enough to understand the capabilities of the created architecture. And at the time when life made me answer the main question of the post, some infrastructure for load tests did exist. It was created on the built-in tools of MS Visual Studio.
We have prepared a stand that includes a medium-power server (Windows Server 2008 R2 virtual machine, 6 CPUs, 10 GB of RAM), on which a database, application server and computer with tests simulating the operation of client applications were deployed. A common test scenario simulates the working day of a typical large organization. Each test scenario performed a specific role of the user in the paperwork: the registrar, document and route controller, the executor of the assignment, approval, and the user who simply read the mail and looked through the available documents.
The percentage of user roles in the general scenario is shown in Fig. 1. This distribution was adopted after a small marketing research of customers on the previous version of the system.


Figure 1: Scenario Distribution

After much discussion, we have determined for ourselves what is “pulling”? This is uninterrupted operation of the server according to the scenario model described above, within 8 working hours and without server errors. At the same time, the percentage of failed client tests should not exceed 0.5% (errors in the business logic of the tests, unexpected timeouts, etc. are possible). The measurement will be subject to the maximum possible number of simultaneously connected users (tests) when all the above conditions are met.

How to pull?


The next problem that had to be solved: how exactly to choose the most optimal number of users? Doing many 8-hour launches is a long and ungrateful task. Therefore, it was proposed to change the number of simultaneous tests dynamically in order to obtain the optimal indicator for one 8-hour run. But depending on what? There are many options for determining the server load: queues for processing requests on the server, speed of query execution on the client, the number of erroneous tests, etc. ...
Here it is worth making a small digression to better understand the further description. The application server of the system processes client requests as follows: all requests arrive in one queue and wait for their processing. there is
custom workflow pool. In our case, there were 12 of them: 2 per
processor core . The last request in the queue is processed by the first
freed worker thread.
As a result, after a series of experiments, it was concluded that the optimal
number of users in our case is to measure the workload
these work flows. It was calculated as the ratio of the time of congestion to the idle time, expressed as a percentage. Thus, 50% means that the workflow is idle half the time, 90% - the workflow is loaded almost all the time. The control function is quite simple: workload within 70-90% is considered normal. If the workload of threads is less than 70%, then a new test was added, but if more than 90%, one of the tests did not start anymore.
What did you get as a result? Ideally, we would like to get a straight line showing the optimal number of users at the end of the test for this configuration. Unfortunately, it has not "settled down." (see the graph in Fig. 2)


Fig. 2. The result of automatic determination of the optimal number of users.

As a result of 8 hours of the test:
• Database size - 9719.50 Mb;
• Documents in the database - 4885;
• Instructions in the database - 14478.

It is clearly seen that the deviation from the average value, equal to 270 users, is small. It is clear that this determined the optimal number of users in the "intensive" mode of server load for this configuration of iron. Naturally, in real life, users do not work with the system like that, i.e. performance of 1000 or more concurrent users on more powerful configurations is pretty tough.
I want to make a reservation right away that the process was quite long, because during testing, parameters for managing the load on the server were selected for a long time, errors on the server and client were fixed, etc.

What pulled out?


If you need to answer how much the system "pulls" or a similar tricky question, then you need to follow a few simple rules:
1. Clearly set the task - what do we want to know about our architecture in a certain environment?
2. Formally determine what exactly we want to measure, and what value should this indicator correspond to?
3. Fix all other possible test parameters that we can fix: hardware configuration, test scripts, their distribution, etc.
4. In the process of approaching the answer to the question posed, do not be distracted by other interesting and unpredictable research tasks.
You can consider this series of rules as a starting point for those who plan to take up the tasks of measuring and improving performance, but where to start is not clear.
And to the eyeballs, a small hit parade of other "uncomfortable" questions related to load testing:
1. "What server is needed for XXX users?"
2. “How many objects can be entered into the system? Will the place end? ”
3. "What speed should the network be for everything to work quickly?"
4. And my favorite: “If we buy a powerful server, will it stop falling?”
As you can see, the scope for future work is provided to us. I hope that the Visual Studio toolkit is quite enough for this ...

Also popular now: