Getting a tree of unique elements using chain transformation

    Many scolded xslt for its resource consumption, ugliness, inflexibility, complexity ... but a lot for what else, probably, they scolded him. Just for those who scold him on the last three points, I wrote this post.

    This post is recognized to fill the gap in your knowledge and introduce xslt in all its beauty.

    Recently, I faced the task of writing a script that would receive from the original xml an xml consisting of only unique elements. Nothing is known in advance about the source file, absolutely nothing.


    Files for experiments . We assume that they indicate the history of sessions on a computer in an Internet club by user login.

    Source:
    1.   
    2.     
    3.     
    4.     
    5.       
    6.         
    7.         
    8.                 
    9.       
    10.     
    11.   
    12.   
    13.     
    14.     
    15.       
    16.         
    17.                 
    18.       
    19.     
    20.   
    21.   
    22.     
    23.     
    24.       
    25.                 
    26.       
    27.     
    28.   
    29.   
    30.     
    31.     
    32.   
    33.   
    34.     
    35.     
    36.     
    37.   
    38.   
    39.     
    40.     
    41.       
    42.         
    43.                 
    44.       
    45.     
    46.   
    47.   
    48.     
    49.   
    * This source code was highlighted with Source Code Highlighter.


    Result:
    1.  
    2.   
    3.   
    4.   
    5.    
    6.     
    7.     
    8.     
    9.     
    10.    
    11.   
    12.   
    13.   
    14.  
    15.  
    16.   
    17.   
    18.    
    19.     
    20.     
    21.    
    22.   
    23.  
    24.  
    25.   
    26.   
    27.   
    28.  
    29.  
    30.   
    31.   
    32.    
    33.     
    34.     
    35.    
    36.   
    37.  
    * This source code was highlighted with Source Code Highlighter.


    Here is what I got as a result, in addition to tears of joy from the work done:
    1.         version="1.0"
    2.         xmlns:exsl="http://exslt.org/common"
    3.         exclude-result-prefixes="exsl">
    4.  
    5.  
    6.   
    7.   
    8.           
    9.     
    10.   
    11.  
    12.   
    13.      
    14.   
    15.     
    16.       
    17.     
    18.          
    19.       
    20.                   
    21.           
    22.             
    23.                     
    24.           
    25.             
    26.           
    27.         
    28.             
    29.     
    30.      
    31.  
    32.     
    33.          
    34.   
    35.  
    36.   
    37.    
    38.   
    39.         
    40.               
    41.                 
    42.   
    43.  
    * This source code was highlighted with Source Code Highlighter.


    And now I’ll explain everything that I have done above:

    1. First, we connect the extensions for xslt with exslt.org and assign it an exsl namespace in order to use one of the features of this library in the future;

    2. exclude-result-prefixes = "exsl" - with this line we disable the prefix from the results of the transformation, so as not to clog these same results. Sometimes, when I forget to do this, then I have to rack my brains for a long time, and why I get the output is not what I need to get;

    3.- with this line we tell xslt to the processor that we want to get valid xml with indentation and UTF-8 encoded output;

    4.- go to the root of our mysterious document;

    5. Next, we take our source document and add the path attribute to each element, containing the full path to it, starting from the root and making our way further through our tree. We do this using a template withand write the result of his work into the var_NodeWithPath variable. And we do it as follows:
    • - go to the root;
    • - add the name of the current element to the result (do not forget about curly braces);
    • - and an attribute containing the full path to it;
    • - and here the function calls itself and processes the descendants of this node, adding more and more details to the parents variable;

    6. Having finally received our var_NodeWithPath variable, we immediately convert it to xml using the exsl: node-set function and work with this “file” -. By the way, that's what it is (O variable sponsor - copy-of- ):

    1.  
    2.   
    3.   
    4.   
    5.    
    6.     
    7.     
    8.     
    9.    
    10.   
    11.  
    12.  
    13.   
    14.   
    15.    
    16.     
    17.     
    18.    
    19.   
    20.  
    21.  
    22.   
    23.   
    24.    
    25.     
    26.    
    27.   
    28.  
    29.  
    30.   
    31.   
    32.  
    33.  
    34.   
    35.   
    36.   
    37.  
    38.  
    39.   
    40.   
    41.    
    42.     
    43.     
    44.    
    45.   
    46.  
    47.  
    48.   
    49.  
    * This source code was highlighted with Source Code Highlighter.


    7. After we placed our variable in the “INeedYourTree” template , we print the name of the document root, and pass the descendants through the “tree” template , in which the most interesting happens;

    8. In the "tree" template , we do the following:
    • - remember the name of the current element;
    • - remember it xPath;
    • - if the current element is the first of its kind (the first of all with the same path attribute value), then ...
    • - we derive this element and collect all the descendants of the current node and nodes with the same history.

    That's all. I hope that someone will benefit from my experience. Thanks for attention.

    UPD:
    MikhailEdoshin   offered a more elegant solution to this problem:
    1.  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    2.  
    3.  
    4.     
    5.   
    6.    
    7.   
    8.  
    9.  
    10.  
    11.   
    12.   
    13.    
    14.      
    15.    
    16.     
    17.    
    18.     
    19.   
    20.     
    21.   
    22.    
    23.   
    24.  
    25.  
    * This source code was highlighted with Source Code Highlighter.

    Also popular now: