Generate OfficeOpenXML documents in 5 minutes

    Often it is necessary to generate a report on the server in the OpenXML format from an application on ASP.NET.

    There are several common ways to do this:
    1. “Found, linked, zayuzal” - go to Google, look for a library to generate docx or xlsx, connect, understand, generate. This is familiar, but for a long time.
    2. "Fu" - use COM. This is not recommended, it requires Microsoft Office installed on the server, it is not very thread-safe, it is not friendly with x64 and is generally old-fashioned.
    3. "B" - to understand the format, assemble from XML and zip. Brutal.
    4. "Microsoft way" - this method is described under the cut.


    Small introduction


    OfficeOpenXML is where you save documents by default when working in Word and Excel: docx and xlsx. The file is a zip archive. You can rename it to zip, open it with the archiver and consider what is inside:
    OfficeOpenXML Folder View
    Reports in OOXML are well understood and edited by familiar means. I would not recommend limiting this format in serious applications, but I advise you to support it.

    Training


    We will need:
    Download OpenXMLSDKTool from the Microsoft website and install it:

    Setup

    Go


    We launch the Open XML SDK 2.0 Productivity Tool:
    Productivity tool
    This tool is very simple and can do two small but important operations:
    • Generate code from a document
    • Compare documents at the XML level
    But first things first.

    Code generation


    We load our document into the program and click “Reflect Code”:
    Reflect Code

    On the left we see the structure of the document — the same files that are in the archive and the presentation of their contents.
    Nodes in the tree can be selected: on the right you can see the contents of the node in the form of XML and code that can generate this particular piece. In my example, one paragraph is visible from the body of the document. It just lives in word / document.xml.
    If you select the root of the tree (the document itself), we get the code for the entire document.

    Now let's use this code
    1. We make the project in Visual Studio. Let it be a simple console C # application
    2. Add reference to the assembly DocumentFormat.OpenXml:
      Add reference
      I have it in the GAC. If you do not want to put it there, you can add a link to the file itself. You can download it separately in the same place where OpenXMLSDKTool was located, but using the link OpenXMLSDKv2.msi
    3. Add reference to WindowsBase
    4. Add the file “GeneratedClass.cs”
    5. Copy the code from the toolbox from the ReflectedCode window
    6. Close the file, save it, go to Program.cs
    7. We write the Main method:
      new GeneratedCode.GeneratedClass().CreatePackage(@"D:\Temp\Output.docx");
    8. We launch
    All. The code for generating the document is ready. The document will look exactly the same as it did before you saved it in Word. Fast, right?

    What's inside?

    What is inside the generated class?
    First, there is one single open method:
    public void CreatePackage(string filePath) {
      using (WordprocessingDocument package = WordprocessingDocument.Create(filePath, WordprocessingDocumentType.Document)) {
        CreateParts(package);
      }
    }

    This is where the text that will be in the document is inserted:
    private void GenerateMainDocumentPart1Content(MainDocumentPart mainDocumentPart1) {
      Run run2 = new Run() { RsidRunProperties = "00184031" };
      Text text2 = new Text();
      text2.Text = "Исчисление предикатов, по определению, философски выводит структурализм, изменяя привычную реальность."; // о.О какую траву курил Яндекс?
    }

    As you can see from the names of private methods in the code, an OpenXml document consists of parts. A separate method has been made to generate each part.
    The most curious, of course, smiling maliciously, inserted a picture into the document.
    Pictures are stored directly in this file, in the form of base64, here:
    #region Binary Data
    //...
    #endregion

    Tie the bows

    Refactoring pictures and replacing static content with dynamic content will be left to the reader as an exercise.
    And here is a method that generates not a file, but an array of bytes - for returning to a client from asp.net without temporary files:
    public byte[] CreatePackageAsBytes() {
      using (var mstm = new MemoryStream()) {
        using (WordprocessingDocument package = WordprocessingDocument.Create(mstm, WordprocessingDocumentType.Document)) {
          CreateParts(package);
        }
        mstm.Flush();
        mstm.Close();
        return mstm.ToArray();
      }
    }

    Everything, the code for generating the report in docx format is ready.
    It remains to replace the content with dynamic. But we did not do all this in order to always give the same thing, right? And add the link "Download in Word format" to the page.

    Document Comparison


    So, we generated the code according to the document. They added a lot of data there, refactored it, implemented it in production. And now we need to change the font and text in the report. How to do this? There is a lot of code, to search in it for a long time.
    It turns out that everything is very simple, the feature of comparing documents will help us:
    1. Put the old and new documents next
    2. Open the Open XML Productivity Tool, select "Compare files ...":
      Compare Dialog
    3. Open the files and click OK.
      Result

      Here is the result of the comparison: On the lines with the file names, you can poke and see what exactly the differences are:
      Comparison details

      In MoreOprions, choose what to ignore when comparing.
      View Part Code shows the code of the part whose XML you see.
      Already to compare XML and the labor code will not be.

    By the way, this feature is still very convenient to use if you are just getting acquainted with the OpenXML format: add something to the document and see what has changed. It will help those who chose the "Kommersant" method, which was mentioned at the beginning of the article.

    Facts

    • With Xlsx rolls. Just like with docx
    • If inside a Docx graph or chart, everything will be fine
    • This is just a strongly-typed wrapper over the System.IO.Packaging library
    • The server does not need anything except this library
    • No problem with x64
    • High performance

    conclusions


    I believe that using DocumentFormat.OpenXml to generate reports in web applications is the right choice. The useful toolkit from the SDK allows you not to waste time in vain.

    What to read


    About OpenXML SDK: msdn.microsoft.com/en-us/library/bb448854(office.14).aspx
    About OpenXML (if anyone is not familiar with it): en.wikipedia.org/wiki/Office_Open_XML

    Good luck! Thanks for attention.

    Also popular now: