.Net Binary serialization without reference to the assembly with the source type or how to negotiate with BinaryFormatter

In this article, I will share the experience of binary type serialization between assemblies, without reference to each other. As it turned out, there are real and “legitimate” cases when you need to deserialize data without having a link to the assembly where it is declared. In the article I will talk about the scenario in which it was required, I will describe the method of solution, and I will also talk about intermediate errors made during the search

Introduction Formulation of the problem


We cooperate with a large corporation working in the field of geology. Historically, the corporation has written very different software for working with data coming from different types of equipment + data analysis + forecasting. Alas, all this software is far from always “friendly” with each other, and more often than not at all friendly. In order to somehow consolidate the information, a web portal is now being created, where different programs upload their data in the form of xml. And the portal is trying to create a plus-minus-complete view. An important nuance: since the developers of the portal are not strong in the subject areas of each application, each team provided a parser / data converter module from its xml to the portal data structures.

I work in a team developing one of the applications and we pretty easily wrote an export mechanism for our part of the data. But here, the business analyst decided that the central portal needed one of the reports that our program was building. This is where the first problem appeared: the report is built anew each time and the results are not saved anywhere.
“So save it!” The reader will probably think. I thought so too, but was seriously disappointed with the requirement that the report be built already for the downloaded data. Nothing to do - you need to transfer logic.

Stage 0. Refactoring. No signs of trouble


It was decided to separate the logic of building the report (in fact, this is a 4-column label, but the logic is a carriage and a large trolley) into a separate class, and include the file with this class by reference in the parser assembly. By this we:

  1. Avoid direct copying
  2. Protecting against version discrepancies

Separating logic into a separate class is not a difficult task. But then everything was not so rosy: the algorithm was based on business objects, the transfer of which did not fit into our concept. I had to rewrite the methods so that they accept only simple types and operate on them. It was not always simple and in places, it required solutions, the beauty of which remained in question, but overall, a reliable solution was obtained without obvious crutches.

There was one detail that, as you know, often serves as a cozy refuge for the devil: we inherited a strange approach from previous generations of developers, according to which some of the data required to build a report is stored in the database as binary-serialized .Net objects ( questions “why?”, “kaaak?”, etc. alas, will remain unanswered due to the lack of addressees). And in the input of the calculations, we, of course, must deserialize them.

These types, which it was impossible to get rid of, we also included "by reference", especially since they were rather not complicated.

Stage 1. Deserialization. Remember the full type name


After doing the above manipulations and performing a test run, I unexpectedly received a runtime error that
[A] Namespace.TypeA cannot be cast to [B] Namespace.TypeA. Type A originates from 'Assembley.Application, Version = 1.0.0.0, Culture = neutral, PublicKeyToken = null' in the context 'Default' at location '...'. Type B originates from 'Assmbley.Portal, Version = 1.0.0.0, Culture = neutral, PublicKeyToken = null' in the context 'Default' at location ''.
The very first Google links told me that the fact is that BinaryFormatter writes not only data, but also type information to the output stream, which is logical. And taking into account that the full name of the type contains the assembly in which it is declared, the picture of what I tried to deserialize, to a completely different one from the point of view of .Net,

obviously loomed. Having scratched the back of my head, I, as it happens, accepted the obvious, but , alas, a vicious decision to replace a specific TypeA type during deserialization with dynamic . Everything worked. The results of the report converged from top to bottom, tests on the build server passed. With a sense of accomplishment, we send the task to the testers.

Stage 2. The main. Serialization between assemblies


Reckoning came quickly in the form of bugs registered by testers, which stated that the parser on the portal side fell with the exception that it could not load the assembly Assembley.Application (assembly from our application). First thought - I didn’t clean references. But - no, everything is fine, no one refers. I try to run it again in the sandbox - everything works. I start to suspect a build error, but here, an idea comes to my mind that does not please me: I change the output path for the parser to a separate folder, and not to the general bin directory of the application. And voila - I get the described exception. Stectrace analysis confirms vague guesses - deserialization is falling.

The awareness was quick and painful: replacing a specific type with dynamic did not change anything, BinaryFormatter still created a type from an external assembly, only when the assembly with the type was nearby, the runtime loaded it naturally, and when the assembly was gone - we get an error.

There was a reason to be sad. But googling gave hope in the form of the SerializationBinder Class . As it turned out, it allows you to determine the type in which our data is deserialized. To do this, create an heir and define the following method in it.

public abstract Type BindToType(String assemblyName, String typeName);

in which you can return any type for given conditions.
The BinaryFormatter class has a Binder property where you can inject your implementation.

It would seem that there is no problem. But again, details remain (see above).

First, you must process requests for all types (and standard too).
An interesting implementation option was found on the Internet here , but they are trying to use the default binder from BinaryFormatter, in the form of a construction

var defaultBinder = new BinaryFormatter().Binder

But in fact, the Binder property is null by default. An analysis of the source code showed that inside the BinaryFormatter, whether Binder is checked, if so, its methods are called, if not, internal logic is used, which ultimately boils down to

    var assembly = Assembly.Load(assemblyName);
    return FormatterServices.GetTypeFromAssembly(assembly, typeName);

Without further ado, I repeated the same logic in myself.

Here's what happened in the first implementation

public class MyBinder : SerializationBinder
    {
 public override Type BindToType(string assemblyName, string typeName)
        {
            if (assemblyName.Contains("") )
            {
                var bindToType = Type.GetType(typeName);
                return bindToType;
            }
            else
            {
                var bindToType = LoadTypeFromAssembly(assemblyName, typeName);
                return bindToType;
            }
        }
        private Type LoadTypeFromAssembly(string assemblyName, string typeName)
        {
            if (string.IsNullOrEmpty(assemblyName) ||
                string.IsNullOrEmpty(typeName))
                return null;
            var assembly = Assembly.Load(assemblyName);
            return FormatterServices.GetTypeFromAssembly(assembly, typeName);
        }
}

Those. it is checked if the namespace belongs to the project - we return the type from the current domain, if the system type - we load from the corresponding assembly

It looks logical. We start testing: our type comes - we replace, it is created. Hurrah! String comes - we go along the branch with loading from the assembly. Works! We open virtual champagne ...

But here ... Dictionary comes with elements of user types: since it is a system type, then ... obviously, we are trying to load it from the assembly, but since it has elements of our type, and again, with full qualification (assembly , version, key), then we fall again. (there should be a sad smile).

Clearly, you need to change the input name of the type, substituting links to the desired assembly. I really hoped that for the type name, there is an analog of the AssemblyName classbut I didn’t find anything like it. Writing a universal parser with replacement is not an easy task. After a series of experiments, I came to the following solution: in the static constructor, I subtract the types to replace, and then I look for their names in the line with the name of the created type, and when I find it, I replace the assembly name

       /// 
        /// The  types that may be changed to local
        /// 
        protected static IEnumerable _changedTypes;
        static MyBinder()
        {
         var executingAssembly = Assembly.GetCallingAssembly();
            var name = executingAssembly.GetName().Name;
            _changedTypes = executingAssembly.GetTypes().Where(t => t.Namespace != null && !t.Namespace.Contains(name) && !t.Name.StartsWith("<"));
//!t.Namespace.Contains(name) - т.е тип объявлен  в этой сборке, но в пространстве имен эта сборка не упоминается
//С "<' начинаются технические типы создаваемые компилятором - нас они не интересуют
        }
        private static string CorrectTypeName(string name)
        {
            foreach (var changedType in _changedTypes)
            {
                var ind = name.IndexOf(changedType.FullName);
                if (ind != -1)
                {
                    var endIndex = name.IndexOf("PublicKeyToken", ind)  ;
                    if (endIndex != -1)
                    {
                        endIndex += +"PublicKeyToken".Length + 1;
                        while (char.IsLetterOrDigit(name[endIndex++])) { }
                        var sb = new StringBuilder();
                        sb.Append(name.Substring(0, ind));
                        sb.Append(changedType.AssemblyQualifiedName);
                        sb.Append(name.Substring(endIndex-1));
                        name = sb.ToString();
                    }
                }
            }
            return name;
        }
        /// 
        /// look up the type locally if the assembly-name is "NA"
        /// 
        /// 
        /// 
        /// 
        public override Type BindToType(string assemblyName, string typeName)
        {
           typeName = CorrectTypeName(typeName);
            if (assemblyName.Contains("") || assemblyName.Equals("NA"))
            {
                var bindToType = Type.GetType(typeName);
                return bindToType;
            }
            else
            {
                var bindToType = LoadTypeFromAssembly(assemblyName, typeName);
                return bindToType;
            }
        }

As you can see, I started from the fact that PublicKeyToken is the last in the type description. Perhaps this is not 100% reliable, but in my tests I did not find cases where this is not so.

Thus, a line of the form
"System.Collections.Generic.Dictionary`2 [[SomeNamespace.CustomType, Assembley.Application, Version = 1.0.0.0, Culture = neutral, PublicKeyToken = null], [System.Byte [], mscorlib, Version = 4.0.0.0, Culture = neutral, PublicKeyToken = b77a5c561934e089]] »

turns into
"System.Collections.Generic.Dictionary`2 [[SomeNamespace.CustomType, Assembley.Portal, Version = 1.0.0.0, Culture = neutral, PublicKeyToken = null], [System.Byte [], mscorlib, Version = 4.0.0.0, Culture = neutral, PublicKeyToken = b77a5c561934e089]] »

Now everything finally worked "like a clock." There were minor technical subtleties: if you remember, the files we included were included in the link from the main application. But in the main application all these dances are not needed. Therefore, a conditional compilation mechanism of the form


                BinaryFormatter binForm = new BinaryFormatter();
#if EXTERNAL_LIB
                binForm.Binder = new MyBinder();
#endif

Accordingly, in the portal assembly we define the macro EXTERNAL_LIB, but in the main application - no

"Non-lyrical digression"


In fact, in the coding process, in order to quickly check the solution, I made one miscalculation, which probably cost me a certain number of nerve cells: for starters, I just hardcoded the type substitution for Dicitionary. As a result, after deserialization, it turned out to be an empty Dictionary, which also “crashed” when trying to perform some operations with it. I was already starting to think that you could not deceive BinaryFormatter , and I began desperate experiments with an attempt to write the Dictionary heir. Fortunately, I almost stopped on time and returned to writing a universal substitution mechanism and, implementing it, I realized that to create a Dictionary it is not enough to redefine its type: you still need to take care of the types for KeyValuePair, Comparer, which are also requested from Binder.


These are the binary serialization adventures. I would be grateful for the feedback.

Also popular now: