.NET Performance: Real Jedi Tricks
Do you need to introduce a real Jedi .NET, a performance guru, multiple Microsoft MVP, a permanent speaker of the DotNext conference, Sasha Goldstein? Probably not worth it. And if you suddenly stand, look here .
In our conversation, Sasha shares professional tips for .NET and .NET Core developers. Describes what to look for when profiling and debugging applications and what tools to use.
- Sasha, about productivity in .NET there are many articles and tips. Starting with the fact that it is undesirable to get involved in throwing exceptions everywhere and using StringBuilder instead of concatenation, ending with low-level optimizations. But .NET is constantly evolving, with new features and new problems. For modern .NET 4.7, can you give any practical advice on optimizing code performance?
Sasha Goldstein:Many are already familiar with the “obvious” tips about string concatenation, exceptions, or boxing / unboxing, but there are still some misconceptions regarding performance when new or higher-level APIs appear. I am worried about the excessive use of LINQ in many codebases. Despite the fact that many years have passed since the advent of LINQ and there is a lot of data showing that most LINQ queries can be made 10 times faster using regular loops, people still often resort to using LINQ in performance-sensitive code. I’ll clarify: I have nothing against LINQ as a whole, but such a solution will not work well if your cycle runs 1 million times.
Another thing that we have not been able to put into the heads of all is the danger of over-allocating memory. The garbage collector in .NET has brought some improvements, but it still does not help if you allocate too much, especially on objects that end up dying in older generations. It is difficult to teach everyone to pay attention to the problem of memory allocation, although with some tools it becomes easier. For example, Heap Allocation Analyzer on Roslyn or profilers like dotMemory .
The last point I would like to make is that we return to the times when the minimum memory and startup time become critical for server applications (and not just workstations) due to container technologies. If we plan to pack 300 instances of a microservice running in Docker on one physical host, then we need to be very careful with memory management, avoid unnecessary dependencies and get rid of unnecessary work. When you get used to 16-core 32-gigabyte servers as a standard runtime environment, you will sober up trying to compress your service into "--memory 256m --cpus 0.25". By the way, we face the same problems in relation to “ serverless"Technology, but the deployment unit is so small that it is usually easier to live with resource constraints.
- Today. NET is not only for Windows. Mono and .NET Core appeared. Tell me, how good are these tools in terms of performance, especially in comparison with native Linux applications?
Sasha Goldstein: Mono has been around for quite some time, but to be honest, I have not used it in combat projects. I know that he has a mature and worthy runtime, but Mono did not receive the recognition that I was counting on. In any case, I can not give recommendations regarding its effectiveness.
Today I can tell you more about .NET Core. On Linux, .NET Core uses a more or less similar stack and code base as on Windows. The compiler performs the same optimizations as on Windows, there is a way to avoid JIT compilation by compiling AOT (called Crossgen in .NET Core), there is a garbage collector with some features, and so on. I would say that the only part of .NET Core that can raise questions is PAL (Platform Adaptation Layer), because this is really the only place that has a completely different code base from the Windows version. In fact, there were performance issues in the PAL, mainly around misuse of the Linux API or the use of the Linux API with unexpected performance behavior, compared to a similar API on Windows.
An interesting development that takes time to mature is CoreRT (.NET Native for Windows), which ultimately seeks to create a managed application, has no external dependencies, and does not generate JIT at all. In addition to simplifying isolation (no sharing, no installation, no dependency management), CoreRT will shorten startup time and reduce memory by shaking off unnecessary code. This can move .NET even closer to native Linux applications.
- One of the main tips that are mentioned in almost all articles about improving .NET performance is the use of a profiler. At DotNext 2017 Moscow, you'll talk about debugging and profiling .NET Core applications for Linux. How is Visual Studio debugging and profiling for .NET Core?
Sasha Goldstein: Visual Studio currently does not offer anything for profiling .NET Core applications on Linux. You cannot use the Windows toolkit to profile .NET Core applications — you need to profiling on Linux, and the only way to actually analyze the results on Windows (if you are prone to this) is to use PerfView, a slightly more sophisticated tool than Visual Studio Profiler.
Similarly, Visual Studio cannot open dumps of the .NET Core application kernel on Linux. The rich debugger data that you use when opening a Windows dump file in Visual Studio does not exist for core dumps; The memory analysis features introduced in Visual Studio 2013 do not work for core dumps. In fact, there is no tool under Windows that can open application dumps from Linux. To do this, you need to use Linux tools, which currently offer much more features.
As you already noticed, I don’t really like it. I can’t say that I am very surprised. Despite all the efforts that MS puts into the development of the platform, and the fact that we are already on version 2.0, we still do not have good tools for profiling and debugging. And three years after its announcement under Linux, there is no decent profiler or debugger that the average developer could use. This is a serious limitation, it sometimes annoys me personally. I honestly do not recommend my clients switch to .NET Core on Linux, because I know that they will fight against the wall when they try to debug or profile their production applications.
- In the world of Linux development, perf and ftrace are popular, which provide good opportunities for analyzing application performance. Will they help with debugging under .NET Core? Are there any differences when using perf or ftrace for native Linux applications and for applications under .NET Core?
Sasha Goldstein: Yes. Officially, the workflow for profiling .NET Core applications on Linux is based on perf. Microsoft provides a bash script called "perfcollect" that runs perf, collects performance data, combines it into a single file, and offers you to open it on Windows using PerfView. Let's momentarily ignore the ridiculousness of this story and talk about how this process works.
Perf is a multi-purpose tool with many different modes of operation. In particular, it can be used as a profiler. Connect to various system events, collect stack traces when these events occur, and then map addresses from this trace stack to method names. It also has some visualization capabilities, but they are often replaced, for example, using Flame Graphs. Now perf is not just a CPU profiler: you can attach it to processes, to cache miss events, to write or read disk events, you can attach it to context switches by the scheduler and other thousands of additional static and dynamic events. The closest we have on Windows is the ETW tool (e.g. PerfView, Windows Performance Recorder, etc.),
When using perf for .NET Core applications, there is one problem that you need to overcome. It is related to address translation. When perf captures the stack trace, it needs to be able to map return addresses on the stack to method names.
Since .NET Core uses JIT compilation, there is no static location where this mapping can be done (debuginfo, symbols, no matter what you call it). The way perf works in this case is that it expects the target application to write a simple text file called /tmp/perf-$PID.map that contains the mapping of method addresses to their names.
Indeed, .NET Core supports this convention - if you set the environment variable “COMPlus_PerfMapEnabled” to 1, the JIT will write to this text file every time the method compiles, and perf will be able to use this information to successfully map the addresses. It’s a little pity that you need to do this in advance - if you did not set an environment variable and want to profile an already running process, you are out of luck - but Node.js also works, for example, so I think this is more or less acceptable.
The story has another twist, this time with AOT-compiled builds. If you use Crossgen for AOT compilation (.NET Core uses it in some of its assemblies in the .NET Core package that you get from your distribution's package manager), you need another way to get debugging information for these assemblies.
The Crossgen tool itself can generate this information if you specify this in a Crossgen-compiled assembly. So far so good, right? Not really. Firstly, Crossgen is not installed with .NET Core, so you either need to compile CoreCLR from source to get it, or use the stupid NuGet recovery trick. Secondly, Crossgen provides debugging information in a format that is not compatible with what perf expects, so you need to format and combine your output with the main peft map file. And thirdly, perf does not currently support viewing perf-maps files for memory partitions that are written to disk, as Crossgen does. Therefore, even if you could get a good perf-map for these assemblies, perf will ignore them. Fortunately, there are other tools that will still work in this case.
By the way, there are many other tools that “light up” and work with .NET Core processes. In my DotNext report, I will use perf along with BPF-based tools from BCC. We discussed BPF and BCC a few months ago at a JPoint conference on JVM profiling with BPF.
- I would like to ask a few questions about LTTng. The official documentation says that LTTng puts performance at the forefront. Does this rule persist when tracing .NET Core applications? Are there any LTTng limitations when using tracepoint or kprobes?
Sasha Goldstein:Let's dot the “and”. LTTng is a powerful Linux tracing framework that has two modes of operation. It has a kernel module that can connect to tracepoint, which are statically defined trace locations scattered throughout the kernel: scheduler events, disk accesses, process execution, etc. In addition, LTTng has a userspace-library that can be used to track events in a user application, and that is what uses .NET Core on Linux. In both cases, LTTng is optimized for a high frequency of events due to buffers with shared memory, a compact binary format and fast write to disk.
As you may know, .NET on Windows has many ETW events that can be used to profile performance and understand system behavior. These include GC events, assembly, JIT compilation, object creation, and many others. On Linux, ETW (Event Tracing for Windows) is, of course, unavailable, so Microsoft chose to use LTTng. You get the same events, but they are generated through LTTng, not through ETW - and with a few caveats.
First, you must set the environment variable (COMPlus_EnableEventLog = 1). If this is not done, LTTng will not generate any events at all. Secondly, LTTng does not support stack traces for user space events. This means that you can intercept GC events, but you do not have a call stack in which they were called; you can catch assembly load events, but you do not know what code loaded this assembly. These are very painful moments that limit the use of these events in real-life troubleshooting scenarios.
- The LLDB debugger for Linux is very similar to WinDbg for Windows. For him, the SOS extension is even available, which allows you to debug managed code. How applicable is it now for debugging .NET Code applications?
Sasha Goldstein:LLDB is a very powerful debugger, and Microsoft provides the libsosplugin.so library, which is the version of SOS.dll for Linux. It provides almost the same set of commands with the same semantics, which is good if you are familiar with SOS (although you still have to learn your own LLDB commands, which are significantly different from WinDbg). But this is not the topic of this conversation, is it? For LLDB with libsosplugin, you will encounter the following obstacles:
- Developing a high-quality multi-threaded application is a difficult task. But even more complicated is the search for bottlenecks and errors in such an application. What can developers use to debug and profile multi-threaded applications for .NET?
Sasha Goldstein: You can talk about it as long as you like, so I’ll be brief. There are several important methods that can be automated using modern tools:
- What reports did you prepare for DotNext 2017?
Sasha Goldstein: In Moscow, I will talk about profiling and debugging .NET Core applications in Linux. This is the result of many months of research. I'll talk about some of the tools and methods that I mentioned above, along with live demonstrations of common performance problems in .NET Core applications. As a special bonus, I'll show you how to use some of these tools to profile a .NET Core application running in a Docker container. All my demos are available on GitHub , so you can experiment with them after the conference.
And for those who want a complete immersion, on November 11, an 8-hour practical training "Production Performance and Troubleshooting of .NET Applications ”dedicated to tools and approaches for monitoring and solving performance problems on the prod.
If you love the .NET interior as much as we do, then you may be interested in the speeches of other experts at the upcoming DotNext 2017 Moscow conference , where more than 30 speakers will give presentations on the present and future of the .NET platform, performance optimization and multithreading, and internal .NET and CLR platform device, about profiling and debugging .NET code.
In our conversation, Sasha shares professional tips for .NET and .NET Core developers. Describes what to look for when profiling and debugging applications and what tools to use.
- Sasha, about productivity in .NET there are many articles and tips. Starting with the fact that it is undesirable to get involved in throwing exceptions everywhere and using StringBuilder instead of concatenation, ending with low-level optimizations. But .NET is constantly evolving, with new features and new problems. For modern .NET 4.7, can you give any practical advice on optimizing code performance?
Sasha Goldstein:Many are already familiar with the “obvious” tips about string concatenation, exceptions, or boxing / unboxing, but there are still some misconceptions regarding performance when new or higher-level APIs appear. I am worried about the excessive use of LINQ in many codebases. Despite the fact that many years have passed since the advent of LINQ and there is a lot of data showing that most LINQ queries can be made 10 times faster using regular loops, people still often resort to using LINQ in performance-sensitive code. I’ll clarify: I have nothing against LINQ as a whole, but such a solution will not work well if your cycle runs 1 million times.
Another thing that we have not been able to put into the heads of all is the danger of over-allocating memory. The garbage collector in .NET has brought some improvements, but it still does not help if you allocate too much, especially on objects that end up dying in older generations. It is difficult to teach everyone to pay attention to the problem of memory allocation, although with some tools it becomes easier. For example, Heap Allocation Analyzer on Roslyn or profilers like dotMemory .
The last point I would like to make is that we return to the times when the minimum memory and startup time become critical for server applications (and not just workstations) due to container technologies. If we plan to pack 300 instances of a microservice running in Docker on one physical host, then we need to be very careful with memory management, avoid unnecessary dependencies and get rid of unnecessary work. When you get used to 16-core 32-gigabyte servers as a standard runtime environment, you will sober up trying to compress your service into "--memory 256m --cpus 0.25". By the way, we face the same problems in relation to “ serverless"Technology, but the deployment unit is so small that it is usually easier to live with resource constraints.
- Today. NET is not only for Windows. Mono and .NET Core appeared. Tell me, how good are these tools in terms of performance, especially in comparison with native Linux applications?
Sasha Goldstein: Mono has been around for quite some time, but to be honest, I have not used it in combat projects. I know that he has a mature and worthy runtime, but Mono did not receive the recognition that I was counting on. In any case, I can not give recommendations regarding its effectiveness.
Today I can tell you more about .NET Core. On Linux, .NET Core uses a more or less similar stack and code base as on Windows. The compiler performs the same optimizations as on Windows, there is a way to avoid JIT compilation by compiling AOT (called Crossgen in .NET Core), there is a garbage collector with some features, and so on. I would say that the only part of .NET Core that can raise questions is PAL (Platform Adaptation Layer), because this is really the only place that has a completely different code base from the Windows version. In fact, there were performance issues in the PAL, mainly around misuse of the Linux API or the use of the Linux API with unexpected performance behavior, compared to a similar API on Windows.
An interesting development that takes time to mature is CoreRT (.NET Native for Windows), which ultimately seeks to create a managed application, has no external dependencies, and does not generate JIT at all. In addition to simplifying isolation (no sharing, no installation, no dependency management), CoreRT will shorten startup time and reduce memory by shaking off unnecessary code. This can move .NET even closer to native Linux applications.
- One of the main tips that are mentioned in almost all articles about improving .NET performance is the use of a profiler. At DotNext 2017 Moscow, you'll talk about debugging and profiling .NET Core applications for Linux. How is Visual Studio debugging and profiling for .NET Core?
Sasha Goldstein: Visual Studio currently does not offer anything for profiling .NET Core applications on Linux. You cannot use the Windows toolkit to profile .NET Core applications — you need to profiling on Linux, and the only way to actually analyze the results on Windows (if you are prone to this) is to use PerfView, a slightly more sophisticated tool than Visual Studio Profiler.
Similarly, Visual Studio cannot open dumps of the .NET Core application kernel on Linux. The rich debugger data that you use when opening a Windows dump file in Visual Studio does not exist for core dumps; The memory analysis features introduced in Visual Studio 2013 do not work for core dumps. In fact, there is no tool under Windows that can open application dumps from Linux. To do this, you need to use Linux tools, which currently offer much more features.
As you already noticed, I don’t really like it. I can’t say that I am very surprised. Despite all the efforts that MS puts into the development of the platform, and the fact that we are already on version 2.0, we still do not have good tools for profiling and debugging. And three years after its announcement under Linux, there is no decent profiler or debugger that the average developer could use. This is a serious limitation, it sometimes annoys me personally. I honestly do not recommend my clients switch to .NET Core on Linux, because I know that they will fight against the wall when they try to debug or profile their production applications.
- In the world of Linux development, perf and ftrace are popular, which provide good opportunities for analyzing application performance. Will they help with debugging under .NET Core? Are there any differences when using perf or ftrace for native Linux applications and for applications under .NET Core?
Sasha Goldstein: Yes. Officially, the workflow for profiling .NET Core applications on Linux is based on perf. Microsoft provides a bash script called "perfcollect" that runs perf, collects performance data, combines it into a single file, and offers you to open it on Windows using PerfView. Let's momentarily ignore the ridiculousness of this story and talk about how this process works.
Perf is a multi-purpose tool with many different modes of operation. In particular, it can be used as a profiler. Connect to various system events, collect stack traces when these events occur, and then map addresses from this trace stack to method names. It also has some visualization capabilities, but they are often replaced, for example, using Flame Graphs. Now perf is not just a CPU profiler: you can attach it to processes, to cache miss events, to write or read disk events, you can attach it to context switches by the scheduler and other thousands of additional static and dynamic events. The closest we have on Windows is the ETW tool (e.g. PerfView, Windows Performance Recorder, etc.),
When using perf for .NET Core applications, there is one problem that you need to overcome. It is related to address translation. When perf captures the stack trace, it needs to be able to map return addresses on the stack to method names.
Since .NET Core uses JIT compilation, there is no static location where this mapping can be done (debuginfo, symbols, no matter what you call it). The way perf works in this case is that it expects the target application to write a simple text file called /tmp/perf-$PID.map that contains the mapping of method addresses to their names.
Indeed, .NET Core supports this convention - if you set the environment variable “COMPlus_PerfMapEnabled” to 1, the JIT will write to this text file every time the method compiles, and perf will be able to use this information to successfully map the addresses. It’s a little pity that you need to do this in advance - if you did not set an environment variable and want to profile an already running process, you are out of luck - but Node.js also works, for example, so I think this is more or less acceptable.
The story has another twist, this time with AOT-compiled builds. If you use Crossgen for AOT compilation (.NET Core uses it in some of its assemblies in the .NET Core package that you get from your distribution's package manager), you need another way to get debugging information for these assemblies.
The Crossgen tool itself can generate this information if you specify this in a Crossgen-compiled assembly. So far so good, right? Not really. Firstly, Crossgen is not installed with .NET Core, so you either need to compile CoreCLR from source to get it, or use the stupid NuGet recovery trick. Secondly, Crossgen provides debugging information in a format that is not compatible with what perf expects, so you need to format and combine your output with the main peft map file. And thirdly, perf does not currently support viewing perf-maps files for memory partitions that are written to disk, as Crossgen does. Therefore, even if you could get a good perf-map for these assemblies, perf will ignore them. Fortunately, there are other tools that will still work in this case.
By the way, there are many other tools that “light up” and work with .NET Core processes. In my DotNext report, I will use perf along with BPF-based tools from BCC. We discussed BPF and BCC a few months ago at a JPoint conference on JVM profiling with BPF.
- I would like to ask a few questions about LTTng. The official documentation says that LTTng puts performance at the forefront. Does this rule persist when tracing .NET Core applications? Are there any LTTng limitations when using tracepoint or kprobes?
Sasha Goldstein:Let's dot the “and”. LTTng is a powerful Linux tracing framework that has two modes of operation. It has a kernel module that can connect to tracepoint, which are statically defined trace locations scattered throughout the kernel: scheduler events, disk accesses, process execution, etc. In addition, LTTng has a userspace-library that can be used to track events in a user application, and that is what uses .NET Core on Linux. In both cases, LTTng is optimized for a high frequency of events due to buffers with shared memory, a compact binary format and fast write to disk.
As you may know, .NET on Windows has many ETW events that can be used to profile performance and understand system behavior. These include GC events, assembly, JIT compilation, object creation, and many others. On Linux, ETW (Event Tracing for Windows) is, of course, unavailable, so Microsoft chose to use LTTng. You get the same events, but they are generated through LTTng, not through ETW - and with a few caveats.
First, you must set the environment variable (COMPlus_EnableEventLog = 1). If this is not done, LTTng will not generate any events at all. Secondly, LTTng does not support stack traces for user space events. This means that you can intercept GC events, but you do not have a call stack in which they were called; you can catch assembly load events, but you do not know what code loaded this assembly. These are very painful moments that limit the use of these events in real-life troubleshooting scenarios.
- The LLDB debugger for Linux is very similar to WinDbg for Windows. For him, the SOS extension is even available, which allows you to debug managed code. How applicable is it now for debugging .NET Code applications?
Sasha Goldstein:LLDB is a very powerful debugger, and Microsoft provides the libsosplugin.so library, which is the version of SOS.dll for Linux. It provides almost the same set of commands with the same semantics, which is good if you are familiar with SOS (although you still have to learn your own LLDB commands, which are significantly different from WinDbg). But this is not the topic of this conversation, is it? For LLDB with libsosplugin, you will encounter the following obstacles:
- LLDB plugins are closely related to a specific version of LLDB. Since libsosplugin.so ships with the CLR, it is built for the LLDB version used by Microsoft in the build process, which - at the time of writing - was LLDB 3.6. This is a pretty old LLDB assembly that has many known bugs, and in fact it is almost impossible to install on many modern distributions without compiling from source code.
- LLDB prior to 4.0 does not understand kernel dumps generated on demand, so you can either open core dumps created as a result of a crash, or join a running process.
- When you open a kernel dump or join a running process, you must teach the SOS plugin to map OS thread IDs to .NET managed thread IDs. If you have 400 threads, this is really annoying (I have a script for this).
- Developing a high-quality multi-threaded application is a difficult task. But even more complicated is the search for bottlenecks and errors in such an application. What can developers use to debug and profile multi-threaded applications for .NET?
Sasha Goldstein: You can talk about it as long as you like, so I’ll be brief. There are several important methods that can be automated using modern tools:
- Understanding which code is causing frequent problems (blocking, but not only). This can be done by analyzing context switching events.
- Understanding the distribution of workload between different threads. This is often done visually using tools such as the Visual Studio Concurrency Visualizer or the timeline function in dotTrace.
- Understanding the different types of oversubscription - strangulation of the processor (for example, due to containers), priority issues or simply the lack of a sufficient number of cores for all your threads. This can be done by analyzing automatic task switching events and creating histograms of the times at which the thread is processed on the CPU or waiting for its turn for service, provided there are no internal reasons to prevent this.
- What reports did you prepare for DotNext 2017?
Sasha Goldstein: In Moscow, I will talk about profiling and debugging .NET Core applications in Linux. This is the result of many months of research. I'll talk about some of the tools and methods that I mentioned above, along with live demonstrations of common performance problems in .NET Core applications. As a special bonus, I'll show you how to use some of these tools to profile a .NET Core application running in a Docker container. All my demos are available on GitHub , so you can experiment with them after the conference.
And for those who want a complete immersion, on November 11, an 8-hour practical training "Production Performance and Troubleshooting of .NET Applications ”dedicated to tools and approaches for monitoring and solving performance problems on the prod.
If you love the .NET interior as much as we do, then you may be interested in the speeches of other experts at the upcoming DotNext 2017 Moscow conference , where more than 30 speakers will give presentations on the present and future of the .NET platform, performance optimization and multithreading, and internal .NET and CLR platform device, about profiling and debugging .NET code.