AlexS June 20, 2008 at 17:26

XmlSerializer - Assembly Leak no demand :)

Briefly about the main thing

Some parts of the .NET Framework, such as the XmlSerializer, use internal dynamic code generation. The XmlSerializer creates temporary C # files, compiles the resulting files into a temporary assembly, and then loads that assembly into the process. This code creation is also relatively expensive, so the XmlSerializer places temporary assemblies in the cache, one for each type. This means that the next time you create XmlSerializer code for class X, a new assembly will not be created, but an assembly from the cache will be used. However, not all so simple.

When calling another constructor, XmlSerializer does not cache the dynamically created assembly, but creates a new temporary assembly whenever a new instance of XmlSerializer is created!
Unmanaged memory leaks occur in the application as temporary assemblies.

Problem localization

First, I’ll talk about the system that was built by our team.

The application consists of three parts - a website, a data warehouse and a business center.
The whole system is built on .net 3.5.
The website allows you to run data validation on a business service that runs on the Windows Workflow Foundation. Each workflow should receive some data (for this, it communicates with the persistence service).

The system is built on the latest technologies, and it is not surprising that when using various combinations of these same technologies, unexpected things can happen.

For example, an application that starts workflows, which, in general, works with them (WCF service), began to eat up to 2.5 gigabytes of memory under load.

We decided the memory leak problem, I will write about it a little later, because now there is no necessary data at hand.

After solving the problem, the process with the application took up to 500 MB, and sometimes up to 800 MB. We knew very well that this was not the limit, that earlier performance was lost at 2 gigabytes. However, the application even with such volume after some time started to work noticeably slower. After some observations, we noticed that the C # compiler csc.exe sometimes starts, which, in principle, should only start on our system when the workflow is first requested, and when it is ready to take a ready-made assembly for the next.

After thinking a little more, we decided to look at the number of assemblies in the process. :)

And here we were surprised - right after launching the application about 100 assemblies were uploaded to the main domain, but over time their number reached 3000, and later up to 5000. And now, even with 4-5 thousand assemblies, there was a noticeable slowdown.

Having written a utility that allows you to view domains and assemblies loaded into them in any .net application on the go, we saw that those 100 initial assemblies remain. And only some anonymous builds are constantly added. Unfortunately, we were not able to obtain more detailed information (what types are declared in the assembly) in someone else's process.

In our test environment, we did not observe such a number of "anonymous" assemblies, although they were there. To get detailed information, we decided to implement the code that gives the information we need right in the application, so that later on the go we get the most complete data.

In general, it turned out that "anonymous" assemblies are assemblies created by XmlSerializer for serialization. And they are all the same :)

Imagine, you serialize the same class 1000 times. And your application is terribly slow and, moreover, your memory is leaking ...

No, well ... This is .net. There is GC there. After all, he is engaged in memory.

Actually, the problem

We now turn to the details. XmlSerializer in .net is capable of causing assembly leak (and assembly leak flows into memory leak). Not always, of course. This class has several constructors.

If you use a regular constructor that accepts Type, then there is no memory leak:

namespace XmlSerializerMemoryLeak
{
class Program
{
  private static XmlSerializer serial = null;

  static void Main (string [] args)
  {
   for (int index = 0; index <10000; index ++)
   {
    TestClass test = new TestClass ();
    test.Id = index;
    test.Date = DateTime.Now;
    StringBuilder builder = new StringBuilder ();
    StringWriter writer = new StringWriter (builder);
    serial = new XmlSerializer (typeof (TestClass));
    serial.Serialize (writer, test);
    string xml = builder.ToString ();
   }
   Console.WriteLine ("Done");
  }
}

public class TestClass
{
  public DateTime Date {get; set; }
  public int Id {get; set; }
}
}
* This source code was highlighted with Source Code Highlighter .

However, if you use a slightly different constructor, then a memory leak is guaranteed:

namespace XmlSerializerMemoryLeak
{
class Program
{
  private static XmlSerializer serial = null;
  static void Main (string [] args)
  {
   Console.ReadLine ();
   for (int index = 0; index <100000; index ++)
   {
    TestClass test = new TestClass ();
    test.Id = index;
    test.Date = DateTime.Now;
    StringBuilder builder = new StringBuilder ();
    StringWriter writer = new StringWriter (builder);
    serial = new XmlSerializer (typeof (TestClass), new XmlRootAttribute ("MemoryLeak"));
    serial.Serialize (writer, test);
    string xml = builder.ToString ();
   }
   Console.WriteLine ("Done");
  }
}

public class TestClass
{
  public DateTime Date {get; set; }
  public int Id {get; set; }
}
}
* This source code was highlighted with Source Code Highlighter .

All the difference between them is visible in the reflector - the first (as well as another one - XmlSerializer (Type, String)) works fine. He crawls into the cache of serializers and looks to see if there is already something ready. No - compile and add to the cache.
But the second one is completely sucks. He does not need any cache. That's why it compiles a new assembly every time and calls assembly leak.

Solutions

There are several outputs:

Use the "correct" constructors
Implement XmlSerializerCache - which will always look in the cache. You can, in principle, not implement, but look here
Do not use serialization, but, for example, if you already have an application that can do serialization (or even does it), you can give the object to it and get only xml itself.

Which one to use depends entirely on you and on the situation. If you have a certain common project, utility classes, so to speak, then I would advise you to implement the assembly cache and use any of the necessary constructors calmly so that you don't have problems in the future. Perhaps in a year or two you will forget about the right constructor.

conclusions

This is not news, this problem is described in MSDN Magazine , it is not clear why it has not yet been fixed.

The conclusions are simple. Use serialization very carefully and closely monitor the state of the application. In addition, it is useful to have some diagnostic methods or services in order to receive the most reliable information about the application.

PS:

At the msdn forum, they answered me that they know about this problem, and that it is described in an article on MSDN Magazine, the link to which I indicated. I'll try to find out more.

Cross-post from a personal blog

Tags: