We turn over large volumes of documentation



    • How do you keep the API help up to date?
    • How can I organize and store localized versions?
    • Do you check the text for invalid characters and the validity of the markup?
    • How to organize verification (proofreading) of topics?


    I often hear these and other questions from technical writers at conferences. For small amounts of documentation, it is enough to manually review the documents and update / substitute / correct everything that is needed. And if the volume of documentation has grown?

    Our documentation has grown to more than 154,000 documents in the .NET product line only, of which about 140,000 documents are API help . About 8-10 thousand topics are added each major release (i.e. twice a year). In this article I will tell you how we deal with such volumes.

    Here I will not give the names of publicly available tools, because all we use are self-written applications and services that are deeply integrated into our infrastructure and poorly applicable outside it. Therefore, in this habratopike I will share technical solutions , not tools.

    The secret to success is simple:



    We keep it so that it is convenient


    We store all documents in MS SQL Server and made an interface (CMS) for easy access to all documents and their editing, verification and preview.
    What we got:
    1. Topics are records in the database, and we have attached a lot of useful service information to them:
      • the name of the author of the topic and the name of the one who last this topic of rules.
      • creation date, last edited date, revision history.
      • various statuses: whether it was checked by the corrector, whether it was approved by the developer, whether it needs to be improved, etc.
    2. The list of topics can be displayed in the form of a table with all its advantages:
      • sorting - you can sort documents in the desired order, for example, by creation date.
      • grouping - you can group documents, for example, by status, authorship, etc.

      • filtering - you can show only those topics that require attention by filtering all the others
    3. Flexible options for submitting documents to the database. Here are some of the most delicious buns:
      • Localization. In the database, you can conveniently organize storage and access to localized documentation. To control the localization process, hang various statuses on topics: translated, not translated, verified, etc. True, we do not localize the documentation.
      • API structure. In the database, you can easily organize a class diagram, inheritance hierarchy, etc. This information can be used to generate related documents.
    4. Single source technology. If the same content (picture, example code, text) needs to be used in several places, then this content can be stored as a separate entity and referenced where it is needed. With the database, this is done simply.



    Automate it!


    Auto-generation of documents from collected libraries.


    There are wonderful tools that allow you to convert documentation comments in code into off-topic topics. This is JSDoc, JavaDoc, Doxygen, Sandcastle, thousands of them ...

    Our API is described by technical writers in the database, not developers in the code. Therefore, we do not need to create ready-made topics from comments in the source. We need to create empty topics in the database.

    This task is performed by a special tool - a synchronizer. It works like this:
    1. takes collected DLLs, through reflection pulls out signatures of all namespaces, classes, etc.
    2. verifies signatures with those in the database.
    3. adds the missing, removes the excess: for example, if the class has a new method, the synchronizer adds an empty topic for this method in the database with the corresponding statuses.

    The technical writer in the interface to the database filters out all topics except empty topics and describes newly added classes, methods, properties, etc.

    Automatically populate content where possible.


    The synchronizer creates an empty topic for the new API element, and fills all the related information. Take, for example, this document: ASPxGridView.StartRowEditing Event .

    With a yellow marker, I highlighted the information that the tech writer fills directly for this topic. I highlighted the section with an example code (orange): you need to give a link to it in the corresponding field. The entire contents of the example are properly drawn into the document.



    The rest is automatically generated:
    1. The namespace of the current class and the library in which this class lies are set automatically.
    2. The syntax of the declaration in C # and VB.NET is compiled automatically from the service information.
    3. Additional information about the event is also automatically pulled out.
    4. In addition, a plate with public properties of the class that contains the event data (event args) is automatically substituted.
    5. As I wrote above, for example, just give a link, the entire contents of the example will be pulled by itself. By the way, this same example can be referenced from another topic.
    6. References to the corresponding class, class members, and namespace are automatically generated. The tech writer can add a few more links as he sees fit.

    Some topics, for example, those that contain a list of class members, are generated automatically. Here is a list of members of the ASPxGridView class . Imagine how it would be to maintain this list manually?


    Testing, continuous integration and code review


    We write documents in an XML-like format. In essence, documentation is also a kind of code. You can make a mistake in it: do not close the tag, enter invalid characters, etc.

    Users receive documentation in more human-readable formats (HTML on the site, CHM, PDF, MSH), that is, documentation must be collected from source. Correcting the errors accumulated over the entire period for the preparation of the release is long and expensive, therefore, the documentation should always be collected and tested.

    We acted in a logical way.

    1. We wrote tests for documentation . Why not? You can automatically check the syntax in the headings of topics, you can check broken links, closed all tags, the presence of bad words in the text or non-ASCII characters (Russian "C" instead of Latin "C"). Tests are chased on the CI server.
    2. On the CI server, a build with documentation installation is also collected daily . If you are not going to, then we look at the build log, take action and start rebuilding.
    3. Code review Content review , in other words, proofreading and verification. Verification is grammatical and factual.
      • Grammatical . We write the documentation in English, and since we, the technical writers, are not native English speakers, our proofreading grammar is checked by proofreaders who have English as their native language. Proofreaders check documents in the same CMS in which technical writers create documentation.
      • Factual . The CMS provides the ability to preview the topic in the form of an HTML page (exactly the same as on the site). A link to this page can be sent to the developer so that he can read the document and suggest improvements.

    Conclusion


    In the comments to Habratopik I will be happy to answer your questions. I will be happy to discuss various organizational and technical issues related to writing documentation, interaction with developers and users.

    Also popular now: