List transfer problems or Why abstractions "flow"

    All non-trivial

    leaky abstractions Joel Spolsky - The law of leaky abstractions


    And sometimes quite simple abstractions also break

    Author of this article



    Most modern developers are familiar with the "law of hole abstractions" fromJoel Spolsky's famous note of the same name. This law consists in the fact that no matter how good the interaction protocol, the framework or the set of classes modeling the subject area, sooner or later we have to go down to a lower level and figure out how this abstraction works. The internal structure of abstraction should be a problem of abstraction itself, but this is possible only in the most general cases and only as long as everything goes well (*).

    Once upon a time, in a "small" small-soft company, they decided why not "abstract" from the location of the object and make the fact of whether the object is local or remote, just a "detail of implementation". So DCOM technologies and its successor .NET Remoting appeared, which hid from the developer whether the object is remote or not. At the same time, all these "transparent proxies" appeared, which made it possible to work with a remote object without even knowing about it. However, over time, it became clear that this information is crucial for the developer, since the remote object can generate a completely different list of exceptions, and the cost of working with it is incomparably higher than the interaction with the local object.



    Of course, such a "concealment of information" is useful, but in the general case it leads more to the complication of the developer’s life, and not to its simplification. That is why the new version of the technology for developing distributed applications called WCF has gone away from this practice and, although the line between the local and the remote object has remained very thin, it nevertheless remains.

    There are a lot of similar examples when we need to know not only visible behavior (abstraction), but also understand the internal structure (implementation). In most programming languages, working with different types of collections is done in a very similar way. Collections can “hide” behind base classes or interfaces (as in .NET), or use some other generalization method (as, for example, in C ++). But, despite the fact that we can work with different collections almost the same way, we cannot completely “untie” our classes from specific types of collections. Despite the apparent similarities, we need to understand what is best used at the moment: a vector or doubly linked list, hash-set or sorted set. The complexity of the basic operations: finding an element,

    Let's look at a specific example. We all know that types such as List < T> (or std :: vector in C ++) are implemented based on a simple array. If the collection is already filled, then when adding a new element a new internal array will be created, while it will “grow” not by one element, but somewhat stronger (**). Many people are aware of this behavior, but in most cases we can not pay any attention to it: this is a "personal problem" of the List < T> class and we do not care.

    But let's assume that we need to pass a list of enums through WCF or just serialize such a list using the DataContractSerializer classes orNetDataContractSerializer (***). The listing is declared as follows:

    public enum Color
    {
      Green = 1,
      Red,
      Blue
    }

    * This source code was highlighted with Source Code Highlighter.


    Do not pay attention to the fact that this enumeration is not marked with any attributes, this is not a hindrance to the NeDataContractSerializer . The main feature of this enumeration is that it does not have a null value; enumeration values ​​begin with 1 .

    The peculiarity of serializing enumerations in WCF is that you cannot serialize a value that does not belong to this enumeration.

    public static string Serialize(T obj)
    {
      // Используем именно NetDataContractSerializer, хотя в данном случае
      // поведение DataContractSerializer аналогичным
      var serializer = new NetDataContractSerializer();
      var sb = new StringBuilder();
      using (var writer = XmlWriter.Create(sb))
      {
        serializer.WriteObject(writer, obj);
        writer.Flush();
        return sb.ToString();
      }
    }
    Color color = (Color) 55;
    Serialize(color);

    * This source code was highlighted with Source Code Highlighter.


    When we try to execute this code, we get the following error message: Enum value '55' is invalid for type Color 'and cannot be serialized. . This behavior is quite logical, because in this way we protect ourselves from passing unknown values ​​between different applications.

    Now let's try passing a collection of one item:

    var colors = new List {Color.Green};
    Serialize(colors);

    * This source code was highlighted with Source Code Highlighter.


    However, this apparently innocuous code also leads to a run-time error with the same content, and the only difference is that the serializer cannot cope with the enumeration value of 0 . What the ... Where could 0 come from ? After all, we are trying to convey a simple collection with one element, while the value of this element is absolutely correct. However, the DataContractSerializer / NetDataContractSerializer , like the good old binary serialization, uses reflection to gain access to all fields. As a result, the entire internal representation of the object, which is contained in both open and closed fields, will be serialized into the output stream.

    Since the class List < T>is based on an array, then during serialization the whole array will be serialized, regardless of how many elements are in the list. So, for example, when serializing a collection of two elements:

    var list = new List {1, 2};
    string s = Serialize(list);

    * This source code was highlighted with Source Code Highlighter.


    In the output stream, we get not two elements, as we might expect, but 4 (i.e. the number of elements corresponding to the Capacity property , not Count ):


      <_items z:Id="2" z:Size="4">
        1
        2
        0
        0
      
      <_size>2
      <_version>2


    * This source code was highlighted with Source Code Highlighter.


    In this case, the reason for the error message that occurs when serializing the list of enumerations becomes clear. Our Color enumeration does not contain a value equal to 0 , and the elements of the list’s internal array are filled with this value:

    image

    This is another example of the “flow” of abstraction, when the internal implementation of even such a simple class as List < T> can prevent us from normalizing it normally.

    Solution



    There are several solutions to this problem, while each of the solutions has its own drawbacks.

    1. Adding a default value


    The simplest solution to this problem is to add a value of 0 to the enumeration or change the value of one of the existing elements:

    public enum Color
    {
      None = 0,
      Green = 1, // или Green = 0
      Red,
      Blue
    }

    * This source code was highlighted with Source Code Highlighter.


    This option is the simplest, but not always possible; enumeration values ​​may correspond to some value in the database, and adding a dummy value may contradict the business logic of the application.

    2. Transferring a collection without "empty" elements


    Instead of doing something with an enumeration, you can ensure that the collection does not contain such empty elements. You can do this, for example, as follows:

    var li1 = new List { Color.Green };
    var li2 = new List(li1);

    * This source code was highlighted with Source Code Highlighter.


    In this case, the variable li1 will contain three additional empty elements (in this case, Count will be 1 , and Capacity - 4 ), and the variable li2 will not (the internal array of the second list will contain only 1 element).

    This option is quite workable, but very “fragile”: breaking the working code will not be difficult. A harmless change on the part of your colleague in the form of deleting an unnecessary intermediate collection and that's all, sailed.

    3. Using other types of collections in the services interface


    Using other data structures, such as an array, or using XML serialization instead of DataContractSerializer, which uses only open members, will solve this problem. But how convenient or not it is up to you.

    Abstractions flow, period. That's why rummaging through the internal implementation of different libraries is very useful. Even if this library perfectly hides all its details, sooner or later you will encounter a situation where without knowledge of its internal structure you cannot solve your problem. Debit, deal with the internal device and do not be afraid that it will change in the future; not the fact that you will need it, but at least it is interesting!

    Z.Y. By the way, think twice to pass meaningful types through WCF in type List < T>. If you have a collection of 524 elements, another 500 additional objects of a significant type will be transferred!

    -

    (*) Joel is far from the first and not the last author to offer an excellent metaphor for these purposes. So, for example, Lee Campbell once perfectly said the same thing, but in slightly different words: “You must understand at least one level of abstraction below the level at which you code.” Details in a short note: On understanding the desired level of abstraction .

    (**) Typically, such data structures double their internal array. So, for example, when adding items to a List, “Capacity” will change in this way: 0, 4, 8, 16, 32, 64, 128, 256, 512, 1024 ...

    (***) The difference between the two main types of WCF serializers is quite important. NetDataContractSerializer , unlike DataContractSerializer , violates the principles of SOA and adds information about the CLR type to the output stream, which violates the "cross-platform" service-oriented paradigm. You can read more about this in the notes: What is WCF or Declarative use of NetDataContractSerializer .

    Also popular now: