The book "High-performance code on the .NET platform. 2nd edition
This book will teach you how to maximize the performance of managed code, ideally without sacrificing any of the benefits of the .NET environment, or in the worst case, sacrificing a minimal number of them. You will learn rational programming methods, learn what to avoid and, most probably, most importantly, how to use the tools that are freely available in order to measure the level of performance without any special difficulties. The training material will have a minimum of water - only the most necessary. The book gives exactly what you need to know, it is relevant and concise, does not contain too much. Most of the chapters begin with general information and background, followed by specific tips, set out like a recipe, and the end with a step-by-step measurement and debugging section for a wide variety of scenarios.
Along the way, Ben Watson will plunge into specific components of the .NET environment, in particular, the Common Language Runtime (CLR) underlying it, and we will see how your machine’s memory is managed, code is generated, multi-threaded execution is organized, and much more is done . You will be shown how the .NET architecture at the same time limits your software tool and provides it with additional features and how the choice of programming paths can significantly affect the overall performance of the application. As a bonus, the author will share with you stories from the experience of creating very large, complex, high-performance .NET systems at Microsoft over the past nine years.
Over time, the thread pool is configured independently, but at the very beginning it has no history and it will start in the initial state. If your software product is extremely asynchronous and uses a central processor significantly, it may suffer from prohibitively high initial launch costs, pending the creation and availability of even more threads. Adjusting the start-up parameters will help to achieve a steady state faster so that from the moment the application starts, you have at your disposal a certain number of ready-made threads:
Be careful here. When using Task objects, their dispatching will be based on the number of threads available for this. If there are too many of them, Task objects can undergo excessive scheduling, which will at least lead to a decrease in the efficiency of the central processor due to more frequent context switching. If the workload is not so high, the thread pool can switch to using an algorithm that can reduce the number of threads, bringing it to a number lower than the specified one.
You can also set their maximum number using the SetMaxThreads method, but this technique is subject to similar risks.
To find out the required number of threads, leave this parameter alone and analyze your application in a stable state using the ThreadPool.GetMaxThreads and ThreadPool.GetMinThreads methods or performance counters that show the number of threads involved in the process.
Interrupting the work of threads without coordination with the work of other threads is a rather dangerous procedure. Streams should clean themselves, and calling Abort for them does not allow them to close without negative consequences. When a thread is destroyed, parts of the application are in an undefined state. It would be better to crash the program, but ideally you need a clean restart.
To safely shut down a thread, you need to use some kind of shared state, and the thread function itself must check this state to determine when it should shut down. Security must be achieved through coherence.
In general, you should always use Task objects - an API is not provided to interrupt a task. To be able to consistently terminate a thread, you must, as noted earlier, use the CancellationToken token.
In general, changing the priority of threads is an extremely unsuccessful undertaking. On Windows, thread dispatch is performed according to their priority level. If high-priority threads are always ready to run, then low-priority threads will be ignored and quite rarely they will get a chance to run. By increasing the priority of a thread, you say that its work should take precedence over all other work, including other processes. This is not safe for a stable system.
It is better to lower the priority of the thread if it is running something that can wait until the completion of tasks of normal priority. One good reason for lowering the priority of a thread may be to discover an out-of-control thread executing an infinite loop. It is impossible to interrupt a thread safely, so the only way to return a given thread and processor resources is to restart the process. Until it becomes possible to close the stream and do it cleanly, lowering the priority of the out-of-control stream will be a reasonable way to minimize the consequences. It should be noted that even threads with a lower priority are still guaranteed to run over time: the longer they are deprived of starts, the higher their dynamic priority will be set by Windows.
There may be well-justified reasons for increasing the priority of the flow, for example, the need to quickly respond to rare situations. But use such techniques should be very cautious. Dispatching threads in Windows is carried out regardless of the processes to which they belong, so a high-priority thread from your process will be launched at the expense of not only your other threads, but also all threads from other applications running on your system.
If a thread pool is used, then any priority changes are discarded each time a thread returns to the pool. If you continue to manage base threads when using the Task Parallel library, you should keep in mind that several tasks may be launched in the same thread before it is returned to the pool.
As soon as the conversation comes to several threads, it becomes necessary to synchronize them. Synchronization consists in providing access of only one thread to a shared state, for example, to a class field. Typically, threads are synchronized using synchronization objects such as Monitor, Semaphore, ManualResetEvent, etc. Sometimes they are informally called locks, and the synchronization process in a specific thread is called lock.
One of the fundamental truths about locks is this: they never increase performance. In the best case scenario - with a well-implemented synchronization primitive and no competition - blocking can be neutral. It leads to stopping the execution of useful work by other threads and to the fact that the CPU time is wasted, increases the context switching time and causes other negative consequences. You have to put up with this because correctness is much more important than simple performance. Whether an incorrect result is quickly calculated does not matter!
Before you begin to solve the problem of using the lock apparatus, we will consider the most fundamental principles.
Justify the need to increase productivity first. This brings us back to the principles discussed in Chapter 1. Performance is not equally important for all of your application code. Not all code must undergo n-th degree optimization. As a rule, it all starts with the “internal loop” - the code that is executed most often or the most critical for performance - and spreads in all directions until the costs exceed the benefits received. There are many areas in the code that are much less important in terms of performance. In such a situation, if you need a lock, calmly apply it.
And now you should be careful. If your non-critical piece of code is executed in a thread from a thread pool and you block it for a long time, the thread pool may start inserting more threads to handle other requests. If one or two threads do this from time to time, that's okay. But if a lot of threads do such things, a problem can arise, because because of this, resources that must do the real work are spent uselessly. Inadvertence when starting a program with a significant constant load can cause a negative impact on the system even from those parts for which high performance is unimportant, due to unnecessary context switching or unreasonable involvement of the thread pool. As in all other cases, measurements must be taken to assess the situation.
The most effective locking mechanism is one that is not. If you can completely eliminate the need for thread synchronization, this will be the best way to get high performance. This is an ideal that is not so easy to achieve. Usually this means that you need to ensure that there is no mutable shared state - each request passing through your application can be processed independently of another request or some centralized mutable (read-write) data. This feature will be the best scenario for achieving high performance.
And still be careful. With restructuring, it's easy to overdo it and turn the code into a messy mess that no one, including yourself, can figure out. You should not go too far unless high productivity is really a critical factor and it cannot be achieved otherwise. Turn the code asynchronous and independent, but so that it remains clear.
If several threads just read from a variable (and there are no hints of writing to it from a stream), synchronization is not needed. All threads can have unlimited access. This automatically applies to immutable objects such as strings or values of immutable types, but can apply to any type of objects if you guarantee the immutability of its value during reading by multiple threads.
If there are multiple threads writing to some shared variable, see if synchronized access can be eliminated by moving to using a local variable. If you can create a temporary copy for work, the need for synchronization will disappear. This is especially important for repeated synchronized access. From re-accessing the shared variable, you need to move to re-accessing the local variable following the one-time access to the shared variable, as in the following simple example of adding items to a collection shared by several threads.
This code can be converted as follows:
On my machine, the second version of the code runs more than twice as fast as the first.
Ultimately, a mutable shared state is a fundamental enemy of performance. It requires synchronization for data security, which degrades performance. If your design has at least the slightest opportunity to avoid blocking, then you are close to implementing an ideal multi-threaded system.
When deciding whether any kind of synchronization is necessary, it should be understood that not all of them have the same performance or behavior characteristics. In most situations, you just need to use a lock, and usually this should be the original option. The use of something other than blocking requires intensive measurements to justify the additional complexity. In general, we consider the synchronization mechanisms in the following order.
1. lock / class Monitor - keeps simplicity, comprehensibility of the code and provides a good balance of performance.
2. The complete lack of synchronization. Get rid of shared mutable states, restructure and optimize. This is more difficult, but if it succeeds, it will basically work better than applying blocking (except when errors are made or architecture is degraded).
3. Simple Interlocked methods of interlocking - in some scenarios may be more appropriate, but as soon as the situation becomes more complicated, proceed to use the lock lock.
And finally, if you can really prove the benefits of their use, use more intricate, complex locks (keep in mind: they rarely turn out to be as useful as you expect):
Specific circumstances may dictate or impede the use of some of these technologies. For example, combining multiple Interlocked methods is unlikely to outperform a single lock statement.
»More information about the book can be found on the publisher’s website
» Contents
» Excerpt
For Khabrozhiteley 25% discount on coupon - .NET
Upon receipt of payment for the paper version of the book, an electronic book is sent by e-mail.
Along the way, Ben Watson will plunge into specific components of the .NET environment, in particular, the Common Language Runtime (CLR) underlying it, and we will see how your machine’s memory is managed, code is generated, multi-threaded execution is organized, and much more is done . You will be shown how the .NET architecture at the same time limits your software tool and provides it with additional features and how the choice of programming paths can significantly affect the overall performance of the application. As a bonus, the author will share with you stories from the experience of creating very large, complex, high-performance .NET systems at Microsoft over the past nine years.
Excerpt: Choose the appropriate thread pool size
Over time, the thread pool is configured independently, but at the very beginning it has no history and it will start in the initial state. If your software product is extremely asynchronous and uses a central processor significantly, it may suffer from prohibitively high initial launch costs, pending the creation and availability of even more threads. Adjusting the start-up parameters will help to achieve a steady state faster so that from the moment the application starts, you have at your disposal a certain number of ready-made threads:
const int MinWorkerThreads = 25;
const int MinIoThreads = 25;
ThreadPool.SetMinThreads(MinWorkerThreads, MinIoThreads);
Be careful here. When using Task objects, their dispatching will be based on the number of threads available for this. If there are too many of them, Task objects can undergo excessive scheduling, which will at least lead to a decrease in the efficiency of the central processor due to more frequent context switching. If the workload is not so high, the thread pool can switch to using an algorithm that can reduce the number of threads, bringing it to a number lower than the specified one.
You can also set their maximum number using the SetMaxThreads method, but this technique is subject to similar risks.
To find out the required number of threads, leave this parameter alone and analyze your application in a stable state using the ThreadPool.GetMaxThreads and ThreadPool.GetMinThreads methods or performance counters that show the number of threads involved in the process.
Do not interrupt flows
Interrupting the work of threads without coordination with the work of other threads is a rather dangerous procedure. Streams should clean themselves, and calling Abort for them does not allow them to close without negative consequences. When a thread is destroyed, parts of the application are in an undefined state. It would be better to crash the program, but ideally you need a clean restart.
To safely shut down a thread, you need to use some kind of shared state, and the thread function itself must check this state to determine when it should shut down. Security must be achieved through coherence.
In general, you should always use Task objects - an API is not provided to interrupt a task. To be able to consistently terminate a thread, you must, as noted earlier, use the CancellationToken token.
Do not change thread priority
In general, changing the priority of threads is an extremely unsuccessful undertaking. On Windows, thread dispatch is performed according to their priority level. If high-priority threads are always ready to run, then low-priority threads will be ignored and quite rarely they will get a chance to run. By increasing the priority of a thread, you say that its work should take precedence over all other work, including other processes. This is not safe for a stable system.
It is better to lower the priority of the thread if it is running something that can wait until the completion of tasks of normal priority. One good reason for lowering the priority of a thread may be to discover an out-of-control thread executing an infinite loop. It is impossible to interrupt a thread safely, so the only way to return a given thread and processor resources is to restart the process. Until it becomes possible to close the stream and do it cleanly, lowering the priority of the out-of-control stream will be a reasonable way to minimize the consequences. It should be noted that even threads with a lower priority are still guaranteed to run over time: the longer they are deprived of starts, the higher their dynamic priority will be set by Windows.
There may be well-justified reasons for increasing the priority of the flow, for example, the need to quickly respond to rare situations. But use such techniques should be very cautious. Dispatching threads in Windows is carried out regardless of the processes to which they belong, so a high-priority thread from your process will be launched at the expense of not only your other threads, but also all threads from other applications running on your system.
If a thread pool is used, then any priority changes are discarded each time a thread returns to the pool. If you continue to manage base threads when using the Task Parallel library, you should keep in mind that several tasks may be launched in the same thread before it is returned to the pool.
Thread synchronization and blocking
As soon as the conversation comes to several threads, it becomes necessary to synchronize them. Synchronization consists in providing access of only one thread to a shared state, for example, to a class field. Typically, threads are synchronized using synchronization objects such as Monitor, Semaphore, ManualResetEvent, etc. Sometimes they are informally called locks, and the synchronization process in a specific thread is called lock.
One of the fundamental truths about locks is this: they never increase performance. In the best case scenario - with a well-implemented synchronization primitive and no competition - blocking can be neutral. It leads to stopping the execution of useful work by other threads and to the fact that the CPU time is wasted, increases the context switching time and causes other negative consequences. You have to put up with this because correctness is much more important than simple performance. Whether an incorrect result is quickly calculated does not matter!
Before you begin to solve the problem of using the lock apparatus, we will consider the most fundamental principles.
Do I need to care about performance at all?
Justify the need to increase productivity first. This brings us back to the principles discussed in Chapter 1. Performance is not equally important for all of your application code. Not all code must undergo n-th degree optimization. As a rule, it all starts with the “internal loop” - the code that is executed most often or the most critical for performance - and spreads in all directions until the costs exceed the benefits received. There are many areas in the code that are much less important in terms of performance. In such a situation, if you need a lock, calmly apply it.
And now you should be careful. If your non-critical piece of code is executed in a thread from a thread pool and you block it for a long time, the thread pool may start inserting more threads to handle other requests. If one or two threads do this from time to time, that's okay. But if a lot of threads do such things, a problem can arise, because because of this, resources that must do the real work are spent uselessly. Inadvertence when starting a program with a significant constant load can cause a negative impact on the system even from those parts for which high performance is unimportant, due to unnecessary context switching or unreasonable involvement of the thread pool. As in all other cases, measurements must be taken to assess the situation.
Do you really need a lock?
The most effective locking mechanism is one that is not. If you can completely eliminate the need for thread synchronization, this will be the best way to get high performance. This is an ideal that is not so easy to achieve. Usually this means that you need to ensure that there is no mutable shared state - each request passing through your application can be processed independently of another request or some centralized mutable (read-write) data. This feature will be the best scenario for achieving high performance.
And still be careful. With restructuring, it's easy to overdo it and turn the code into a messy mess that no one, including yourself, can figure out. You should not go too far unless high productivity is really a critical factor and it cannot be achieved otherwise. Turn the code asynchronous and independent, but so that it remains clear.
If several threads just read from a variable (and there are no hints of writing to it from a stream), synchronization is not needed. All threads can have unlimited access. This automatically applies to immutable objects such as strings or values of immutable types, but can apply to any type of objects if you guarantee the immutability of its value during reading by multiple threads.
If there are multiple threads writing to some shared variable, see if synchronized access can be eliminated by moving to using a local variable. If you can create a temporary copy for work, the need for synchronization will disappear. This is especially important for repeated synchronized access. From re-accessing the shared variable, you need to move to re-accessing the local variable following the one-time access to the shared variable, as in the following simple example of adding items to a collection shared by several threads.
object syncObj = new object();
var masterList = new List<long >();
const int NumTasks = 8;
Task[] tasks = new Task[NumTasks];
for (int i = 0; i < NumTasks; i++)
{
tasks[i] = Task.Run(()=>
{
for (int j = 0; j < 5000000; j++)
{
lock (syncObj)
{
masterList.Add(j);
}
}
});
}
Task.WaitAll(tasks);
This code can be converted as follows:
object syncObj = new object();
var masterList = new List<long >();
const int NumTasks = 8;
Task[] tasks = new Task[NumTasks];
for (int i = 0; i < NumTasks; i++)
{
tasks[i] = Task.Run(()=>
{
var localList = new List<long >();
for (int j = 0; j < 5000000; j++)
{
localList.Add(j);
}
lock (syncObj)
{
masterList.AddRange(localList);
}
});
}
Task.WaitAll(tasks);
On my machine, the second version of the code runs more than twice as fast as the first.
Ultimately, a mutable shared state is a fundamental enemy of performance. It requires synchronization for data security, which degrades performance. If your design has at least the slightest opportunity to avoid blocking, then you are close to implementing an ideal multi-threaded system.
Sync Preference Order
When deciding whether any kind of synchronization is necessary, it should be understood that not all of them have the same performance or behavior characteristics. In most situations, you just need to use a lock, and usually this should be the original option. The use of something other than blocking requires intensive measurements to justify the additional complexity. In general, we consider the synchronization mechanisms in the following order.
1. lock / class Monitor - keeps simplicity, comprehensibility of the code and provides a good balance of performance.
2. The complete lack of synchronization. Get rid of shared mutable states, restructure and optimize. This is more difficult, but if it succeeds, it will basically work better than applying blocking (except when errors are made or architecture is degraded).
3. Simple Interlocked methods of interlocking - in some scenarios may be more appropriate, but as soon as the situation becomes more complicated, proceed to use the lock lock.
And finally, if you can really prove the benefits of their use, use more intricate, complex locks (keep in mind: they rarely turn out to be as useful as you expect):
- asynchronous locks (will be discussed later in this chapter);
- other.
Specific circumstances may dictate or impede the use of some of these technologies. For example, combining multiple Interlocked methods is unlikely to outperform a single lock statement.
»More information about the book can be found on the publisher’s website
» Contents
» Excerpt
For Khabrozhiteley 25% discount on coupon - .NET
Upon receipt of payment for the paper version of the book, an electronic book is sent by e-mail.