Silverlight and encodings

    Silverlight is quite convenient in that it provides an almost "full" .net in client applications. If it weren’t for “almost,” then everything would be wonderful. Recently, I needed to use one .net library. I started by rearranging the project settings to silverlight and adding it to the main project. The application was compiled and I was already glad that it was so easy to use the existing developments, but it was too early to rejoice ... The

    application began to crash in the most unusual places. Debugging showed that the library cannot find the encoding latin1 it needs. I thought that the encoding in this case is called a little differently, and began to google. It turned out much worse: as Microsoft itself reports, the silverlight core supports only 3 encodings (utf-8, utf-16LE, utf-16BE), and the library I needed required latin1 (and in our realities - in some cases windows-1251).

    Upd: it was impossible to transfer to the Unicode library, because its task was to read from the client machine the files in the encoding in which they were stored there.

    I did not find any ready-made solutions, only similar complaints on the forums . Therefore, I decided to write my own bike.

    The source of encodings for a bicycle is a “full-fledged” desktop one. Because Since only single-byte encodings were required, then all the characters from them can be easily obtained by passing to the input Encoding.GetChars an array filled from 0 to 255.

    First, the first version of the GetString method (byte [] bytes, int start, int count) was assembled “on the knee”:
    var sb = new System.Text.StringBuilder(count) { Length = count };
    count += start;
    for (var i = start; i < count; i++)
      sb[i - start] = chars[bytes[i]];
    return sb.ToString();

    * This source code was highlighted with Source Code Highlighter.

    Next, I wanted to slightly increase performance, and the lookup in the array was replaced by switch:
    var sb = new System.Text.StringBuilder(count) { Length = count };
    count += start;
    for (var i = start; i < count; i++) {
      char tmp;
      switch (bytes[i]) {
        case 0: tmp = '\u0000'; break;
        case 1: tmp = '\u0001'; break;
    ...
        default: tmp = '\u02D9'; break;
      }
      sb[i - start] = tmp;
    }
    return sb.ToString();

    * This source code was highlighted with Source Code Highlighter.

    At the same time, I decided to dispel my doubts about how to work faster with the StringBuilder class
    option 3 (using .Append () instead of index):
    var sb = new System.Text.StringBuilder(count);
    count += start;
    for (var i = start; i < count; i++) {
      switch (bytes[i]) {
        case 0: sb.Append('\u0000'); break;
        case 1: sb.Append('\u0001'); break;
    ...
        default: sb.Append('\u02D9'); break;
      }
    }
    return sb.ToString();

    * This source code was highlighted with Source Code Highlighter.

    All methods showed low performance, almost an order of magnitude slower than the built-in utf-8 implementation on files with English text (i.e. when utf-8 also fits 1 byte per 1 character).
    Then I decided to use just an array of char [] characters:

    var sb = new char[count];
    for (var i = 0; i < sb.Length; i++) {
      switch (bytes[i + start]) {
        case 0: sb[i] = '\u0000'; break;
        case 1: sb[i] = '\u0001'; break;
    ...
        default: sb[i] = '\u02D9'; break;
      }
    }
    return new string(sb);

    * This source code was highlighted with Source Code Highlighter.

    Update: in the comments they advised # to combine the first and last method, this code turned out:
    var result = new char[count];
    for (var i = 0; i < result.Length; i++)
      result[i] = charMap[bytes[i + index]];
    return result;

    * This source code was highlighted with Source Code Highlighter.

    A similar code was also tested where a direct cast to (char) was used for the first 128 characters, but it turned out to be slower (because lookup on a small array is faster than comparing with a number and casting).

    Performance Measurements:

    OptionTime, ms
    utf-8 (built-in)140-156
    No. 1 (array lookup)1340-1352
    No. 2 (StringBuilder []) 1562-1578
    No. 3 (StringBuilder.Append)1344-1375
    No. 4 (char [])451-468
    No. 5 (char [] + array lookup)306-319

    The result, I think, is obvious, and I chose method No. 4 method No. 5.

    Thanks to all users for useful comments! I managed to save even a few milliseconds and tens of kilobytes of generated code;)

    Unfortunately, the encoding data cannot be directly “slipped” to the StreamReader / StreamWriter classes, but this solution was enough for my needs.

    For convenience, I made a small “backup” generator for encodings. Maybe someone will need it.

    Also popular now: