Collaboration mechanism in MS Office 2010 + SharePoint 2010: protocols and packages

    The co-authorship mechanism appeared with the release of MS Office 2010 and SharePoint 2010. Many people call the appeared functionality as “long-awaited”; indeed, it makes no sense to downplay the usability of it. But this article will discuss how this mechanism works and what benefits can be drawn from it. A new field for activity in the end!

    Interesting user


    First of all, it is necessary to pay attention to how the work of the co-authorship mechanism looks like. Exploring the Internet, it turned out that they have already done this in sufficient detail, so I will not repeat myself and suggest that I familiarize myself with it.

    Interesting to the developer


    It was interesting for me to understand how the user actions and the packages formed at the same time are related. Learn how to identify actions on forwarded packets. This would allow us to collect information on the work of users (and in my case, employees) on a corporate portal by writing an HTTP Module. Such information becomes useful, for example, for increasing the number of conversions or tracking the intensity of work on the preparation of documentation. Moreover, practice has shown that it is more efficient and safer to listen to packets entering the server, rather than outgoing packets.

    Due to caching of documents, the task is complicated by the fact that information at the level of the MS-FSSHTTP protocol ambiguously determines the user's actions in some cases. For example, when a document opening event occurs for the first time, a document is being downloaded and it is easy to determine from the package description, and when a document is opened a second time, it does not download the document, but compares it in the cache with the document on the server when it is opened, although semantically these two events mean start of work with the document.
    The co-authorship mechanism works successfully in Word 2010, Excel 2010, OneNote 2010, SharePoint Workspace 2010 with SharePoint Foundation 2010 or SharePoint Server 2010. This is indicated in the specifications.

    Protocols and Packages

    All communication between the SharePoint server and the MS Office 2010 user occurs using the MS-FSSHTTP (File Synchronization via SOAP over HTTP Protocol Specification) and MS-FSSHTTPB (Binary Requests for File Synchronization via SOAP Protocol Specification) protocols.
    For greater convenience, I will immediately provide links to the description of the protocols:
    - MSDN about MS-FSSHTTP or download the MS-FSSHTTP specification
    - MSDN about MS-FSSHTTPB or download the MS-FSSHTTPB specification

    When exchanging packets, the MS-FSSHTTP protocol transmits information about the entity with which work begins (its Url), to the author, who starts working with the entity, and the mode of operation (Read Only and Edit require different privileges). The MS-FSSHTTP part of the package is XML, which does not add any grace to this protocol. The XML subdivisions in the package are numbered by the SubRequestToken parameter. It becomes possible to check the integrity of the package using the DependsOn parameter. The entity within the MS-FSSHTTP and MS-FSSHTTPB protocols is called cell.
    The structure of the MS-FSSHTTP request fits into the following pattern:

     
       //Информация о версиях используемого протокола
       //Указывается CorrelationID, guid используемый
                 //для логов на сервере и синхронизации пакетов.
        //Содержит URL документа
         //Определения пакета
        

          …
       

      

     




    Some clarifications on the structure of the MS-FSSHTTP protocol:

    - the mandatory structure of the protocol package indicating the document to which the request is addressed.

    - the announcement of the co-authorship mode, the attached tag indicates what type of co-authorship is used.

    - indicates the presence of a document lock and its type in a nested tag. In response, the server indicates whether you start working with the document first or join as a co-author.

    - carries direct information about changes in the document. It contains the MS-FSSHTTPB part of the package.

    - User information.

    An example of the MS-FSSHTTP part of the packet from the user to the server (request):
    The packet reports that a 2MB.docx document is being accessed at the specified address and the user is ready to receive the document. BinaryDataSize indicates the length of the value inside the tag, i.e. MS-FSSHTTPB part.

     
      
      
       
        
         
    DAALAJzPKfM5lAabBgIAAO4CAACqAiAAfrgx50XdqkSrgAx1+9FTDnoCC
    ACUKaEPdwEWAgYAAwUAigICAADaAgYAAwAAygIIAAgAgAOEAEELAawCAFUDAQ==
         

        

        
        
         
    DAALAJzPKfM5lAabBgIAAO4CAACqAiAAfrgx50XdqkSrgAx1+9FTDnoCC
    ACUKaEPdwEWAgYAAwUAigICAADaAgYAAwAAygIIAAgAgAOEAEELAawCAFUDAQ==
         

        

        
        
         
    DAALAJzPKfM5lAabBgIAAO4CAACqAiAAfrgx50XdqkSrgAx1+9FTDnoCC
    ACUKaEPdwEWAgYAAwUAigICAADaAgYAAwAAygIIAAgAgAOEAEELAawCAFUDAQ==
         

        

       

      

     




    Package research

    Studying the specifications, no answer was found on how to determine the beginning of work with a document. Therefore, it is necessary to check the signs revealed during the analysis of packages with various types and sizes of documents, arrange a crash-test, otherwise it is difficult to call any sign true, and the work is justified. Therefore, all traditionally used formats were analyzed with document sizes up to 30Mb.
    The results of the analysis:
    • The maximum packet size does not exceed 3 MB, that is, a 30 MB file is transferred for 10 packets.
    • Only by analyzing MS-FSSHTTP it is impossible to solve the problem of determining the opening of a document (due to the operation of the Upload Center with the caching function), the remaining actions are determined.
    • Packages with the MS-FSSHTTP part are accessed at moss14 / _vti_bin / cellstorage.svc / CellStorageService , the rest are not interesting in this task. That is, it will be a filter for analyzing HTTP Request.

    It's time to get acquainted with the MS-FSSHTTPB protocol. The data is presented in Base64 encoding, and to determine the structure it is necessary to translate into HEX and binary code. Moreover, BinaryDataSize equal to 88 is the "empty" MS-FSSHTTPB part, a clean request template MS-FSSHTTPB. It contains all sections of the MS-FSSHTTPB request, but there is no information in the sections.
    If we talk about the distribution of roles between the two protocols, then MS-FSSHTTP describes the information that is visible outside the document, i.e. all that we can find out without opening the file, and MS-FSSHTTPB informs about what is happening inside the document, what changes are made, in which place of the document. Thus, the information encoded in the MS-FSSHTTPB part of the package allows you to synchronize the status of documents from co-authors, transmitting only the changed parts of the document, thereby significantly reducing the load on the network. True, another implementation, from my point of view, would not be logical.
    Consider the MS-FSSHTTB protocol-transmitted string, where the BinaryDataSize value is 88.
    The original value from the XML tag of the MS-FSSHTTP protocol:
    DAALAJzPKfM5lAabBgIAAO4CAACqAiAAfrgx50XdqkSrgAx1+9FTDnoCCACUKaEPdwEWAgYAAwUAigICAADaAg
    YAAwAAygIIAAgAgAOEAEELAawCAFUDAQ==

    Decoded HEX code (see MS-FSSHTTPB specification page 71-73):
    0c 00 0b 00 //Protocol Version + Minimum Version
    9c cf 29 f3 39 94 06 9b //Signature
    06 02 00 00 //CellRequest Start
    ee 02 00 00 //User Agent Start
    aa 02 20 00 //User Agent GUID
    7e b8 31 e7 45 dd aa 44 ab 80 0c 75 fb d1 53 0e //GUID
    7a 02 08 00 //User Agent Version
    94 29 a1 0f //Version
    77 01 16 02 06 00 //User Agent End + SubRequest Start
    03 05 //Request DI + Request Type
    00 8a 02 02 00 //Priority + Query Changes
    00 da 02 06 00 //Allow Fragments/Reserved + Query Changes Request Argument
    03 00 00 //Include Storage Manifest + Cell ID
    ca 02 08 00 //Query Changes Data Constraints
    08 00 80 03 //Maximum Data Element
    84 00 //Knowledge Start
    41 //Knowledge End
    0b 01 ac 02 //SubRequest End + Data Element Packege Start
    00 55 03 01 //Reversed + Data Emlement Packege End + Cell Request End


    Now, a new iteration and analysis of MS-FSSHTTPB is added to the analysis conducted earlier. Depriving the reader of the pleasure of boring examples, I bring the results of the analysis:
    • 1. An empty packet is a signal of readiness to receive a document, which means it helps to determine the event of opening a document from the server. The key line for the filtering request module is:


      however, if the document was cached in the Upload Center, then BinaryDataSize is greater than 88, i.e. It will not be an empty template, because it is necessary to check the document for validity. To identify such a case, another feature was found.
    • 2. The package checking the cached document in Upload Center also contains the line specified in paragraph 1 of this list. However, the value of the BinaryDataSize parameter is greater, the larger the document being checked from Upload Center.

    For clarity, I will give an example of parsing such a package.
    DAALAJzPKfM5lAabBgIAAO4CAACqAiAAfrgx50XdqkSrgAx1+
      9FTDnoCCACUKaEPdwEWAgYAAwUAigICAADaAgYAAwAAygIIAAgAgAOEACYCIAD2NXoyYQc
      URJaGUekAZnpNpAB4KCn1koJCYUFHqgOvQEOf2v3CNZY9eCgp9ZKCQmFBR6oDr0BDn9r9r
      j3iPngoKfWSgkJhQUeqA69AQ5/a/fo+JkB4KCn1koJCYUFHqgOvQEOf2v0+
      QGpBeCgp9ZKCQmFBR6oDr0BDn9r9gkGeQngmRlDki07gDrGjv1Ojie167QDOAngmua8bdL
      Ef8U6jv1Ojie167QBmD1ETASYCIAATHwkQgsj7QJiGZTP5NMIdbAFw0Qz5C0E3b9GZRKbD
      JyMu3KcRrXsAOAAyADkAMgBGADUAMgA5AC0ANgAxADQAMgAtADQANwA0ADEALQBBAEEAMA
      AzAC0AQQBGADQAMAA0ADMAOQBGAEQAQQBGAEQAfQAsADMALAAzAAAAtRMBJgIgAA7pdjoy
      gAxNud3zxlApQz5MASAoDLmvG3SxH/FOo79To4nteu1mDwClEwFBCwGsAgBVAwE=



    HEX Decoded Code:
    0c 00 0b 00 //Protocol Version + Minimum Version
    9c cf 29 f3 39 94 06 9b //Signature
    06 02 00 00 //CellRequest Start
    ee 02 00 00 //User Agent Start
    aa 02 20 00 //User Agent GUID
    7e b8 31 e7 45 dd aa 44 ab 80 0c 75 fb d1 53 0e //GUID
    7a 02 08 00 //User Agent Version
    94 29 a1 0f //Version
    77 01 16 02 06 00 //User Agent End + SubRequest Start
    03 05 //Request DI + Request Type
    00 8a 02 02 00 //Priority + Query Changes
    00 da 02 06 00 //Allow Fragments/Reserved + Query Changes Request Argument
    03 00 00 //Include Storage Manifest + Cell ID
    ca 02 08 00 //Query Changes Data Constraints
    08 00 80 03 //Maximum Data Element
    84 00 //Knowledge Start
    \\ Обратите внимание, что до этого моменат пакет полностью описывалась выше

    26 02 20 00 //cell knowledge range
    f6 35 7a 32 61 07 14 44 96 86 51 e9 00 66 7a 4d //GUID in cell knowlegde range
    a4 00 78 28
    29 f5 92 82 42 61 41 47 aa 03 af 40 43 9f da fd
    c2 35 96 3d 78 28
    29 f5 92 82 42 61 41 47 aa 03 af 40 43 9f da fd
    ae 3d e2 3e 78 28
    29 f5 92 82 42 61 41 47 aa 03 af 40 43 9f da fd
    fa 3e 26 40 78 28
    29 f5 92 82 42 61 41 47 aa 03 af 40 43 9f da fd
    3e 40 6a 41 78 28
    29 f5 92 82 42 61 41 47 aa 03 af 40 43 9f da fd
    82 41 9e 42 78 26 46 50 e4 8b 4e e0 0e b1 a3 bf 53 a3 89 ed 7a ed 00
    ce 02 78 26 //Pre DATA
    b9 af 1b 74 b1 1f f1 4e a3 bf 53 a3 89 ed 7a ed //GUID before FROM number
    00 66 0f //FROM number
    51 13 01 26 02 20 00 13 1f 09 10 82 c8 fb 40 98 86 65 33 f9 34 c2 1d 6c 01 70 d1 0c f9 0b 41 37 6f d1 99 44 a6 c3 27 23 2e dc a7 11 ad
    7b 00 38 00 32 00 39 00 32 00 46 00 35 00 32 00 39 00 2d 00 36 00 31 00 34 00 32 00 2d 00 34 00 37 00 34 00 31 00 2d 00 41 00 41 00 30 00 33 00 2d 00 41 00 46 00 34 00 30 00 34 00 33 00 39 00 46 00 44 00 41 00 46 00 44 00 7d 00 2c 00 33 00 2c 00 33 00 00 00
    b5 13 01 26 02 20 00 0e e9 76 3a 32 80 0c 4d b9 dd f3 c6 50 29 43 3e 4c 01 20 28 0c //DATA changeset
    b9 af 1b 74 b1 1f f1 4e a3 bf 53 a3 89 ed 7a ed //GUID before TO number
    66 0f 00 //To number
    a5 //Cell range End
    13 01 //Cell End

    // далее пакет соответствует шаблону пустого запроса
    41 //Knowledge End
    0b 01 ac 02 //SubRequest End + Data Element Packege Start
    00 55 03 //Reversed + Data Emlement Packege End + Cell Request End

    It is not so simple to parse the contents of the cell knowledge range (see the MS-FSSHTTPB specification page 37). I selected a repeating Data + GUID bundle as part of the code passed to variable FROM, but it is not necessarily present in the package with the small size of the file being checked, so this design cannot be considered as a sign. Regarding the semantics of this design, a similar design is described in responce packages. You can try to explain why the request package uses the constructs from responce, but why the description of these structures is not described in the request request specification is difficult to explain. Now let's pay attention to the data enclosed between variable FROM and variable TO. This is where the cornerstone lies, namely between the words “7b 00” and “b5 13” the contents fit into the regular expression “\ w {2} [00]”, but do not forget that spaces are set only to increase readability. This symptom has proven itself in all tests with documents. Hurray comrades! But you can not stop and try to penetrate the semantics of such a design. An experienced developer may notice that UTF-16 (hex) encoded data is presented in this form. Transforming stock

    7b 00 38 00 32 00 39 00 32 00 46 00 35 00 32 00 39 00 2d 00 36 00 31 00 34 00 32 00
    2d 00 34 00 37 00 34 00 31 00 2d 00 41 00 41 00 30 00 33 00 2d 00 41 00 46 00 34 00 30 00 34 00
    33 00 39 00 46 00 44 00 41 00 46 00 44 00 7d 00 2c 00 33 00 2c 00 33 00 00 00

    in UTF-16 (hex) we get
    {8292f529-6142-4741-aa03-af40439fdafd},3,3


    As a result, it turns out that the parameters under construction are transmitted: the GUID and two numerical parameters, which makes the structure meaningful and even allows the developer to try to understand its semantics. It can be assumed that the GUID is used to check the cached document for validity. This version has withstood all possible criticism and seems very successful!

    Conclusion


    In this article, I tried to highlight the co-authorship mechanism in MS Office 2010 + SharePoint 2010 at the protocol level. I hope I managed to clearly explain some of the features that are not covered in the protocol specifications and which were previously not given attention. It should be noted that since the end of my personal investigation into the work of the MS-FSSHTTP and MS-FSSHTTPB protocols, the documentation on MSDN has been substantially supplemented.

    In general, the work of co-authorship is very similar to the basic principles of work on editing documents in Unix-based systems. The differences are not so significant as to call the co-authorship mechanism innovative. I would like to finish this article with this thought.

    I wish you interesting studies!

    Source code in this article was highlighted with Source Code Highlighter .

    Also popular now: