Use GIT when documenting

    Sometimes, not only the documentation itself, but also the process of working on it can be critical. For example, in the case of projects, the lion's part of the work is connected with the preparation of documentation, and an incorrect process can lead to errors and even loss of information, and, consequently, to loss of time and profit. But even if this topic is not central to your work and is located on the periphery, the correct process can still improve the quality of the document and save you time.

    The approach outlined here, with an example of a specific implementation , has a low entry threshold. Technically, tomorrow you can start working in a new way.

    Formulation of the problem

    You need to create a certain document or a set of documents. Perhaps this is project documentation or logging of your network, or something simpler, for example, you should describe the processes in the company or in your department. In general, we are talking about any document or set of documents with text, pictures, plates ... We complicate the task by the fact that

    1. this work involves joint work, the efforts of a group or several groups of employees
    2. At the output, you want to have a document in a specific format, with corporate-style attributes, created according to a specific template. For definiteness, we will assume that this is MS Word (.docx)

    10 years ago, the approach would be unambiguous: we would create an MS Word document or documents and somehow organize the work on the change.

    And this approach is still valid. It is also used by large integrators when creating project documentation. But it is intuitively clear that if you are really intensive, with many edits and discussions, for a long time working on a document, this approach is not very convenient.

    I quite acutely felt this problem while working in one large integrator. The process of changing the design documentation was as follows:

    1. engineer downloads the latest version of MS Word (.docx) document
    2. changes the name
    3. makes changes to the track mod
    4. sends the document with corrections to the architect
    5. also sends a list of all corrections with comments
    6. architect analyzes change
    7. if all is well, then copies the change data to the file with the latest version, changes the version, puts it on a shared resource
    8. if there are comments, then a discussion is initiated (email or rallies)
    9. consensus is reached
    10. further paragraphs 3 - 9

    While the work was not intense, it somehow, but still worked. But at a certain point, this process became the bottleneck of the whole project and led to problems. The fact is that everything becomes bad as soon as changes are made often and simultaneously by several teams.

    So, when we went to the preliminary testing stage, various problems began to appear and, although in the details, we had to change the documentation often - four different teams, daily, almost simultaneously, with discussions. All these changes passed through one engineer - an architect. The file with the design of the project was huge, and as a result, the architect was overwhelmed with the routine work associated with a lot of copying, editing, made a lot of mistakes, I had to double-check, re-send, and in general it was close to chaos.

    In this case, this approach, the approach of working on an MS Word document, worked with great squeak and created problems.

    Git markdown

    Faced with the problem described in the example above, I began to research this issue.
    I saw that using Markdown with Git while creating documents is becoming more and more popular .

    Git is a development tool. But why not use it for the documentation process? In this case, the issue of multi-user work becomes resolved. But in order to make full use of the Git features, we need a text format for the document, we need to find another tool, not MS Word, and Markdown is great for these purposes.

    Markdown is a simple text markup language. It is designed to create beautifully designed texts in regular TXT files. If we create our documents in Markdown, then the Markdown-Git link looks natural.

    And everything would be fine, and in this place it would be possible to put an end if not for our second condition: “at the output we need a document in a certain format, with corporate-style attributes created according to a certain template” (and we first agreed that for certainty, it will be MS Word). That is, if we decided to use Markdown, then we need to somehow convert this file to .docx of the required type.

    There are programs for converting between different formats, for example, Pandoc .
    You can convert Markdown file to .docx format with this program.
    But still, you need to understand that, firstly, not everything that is in Markdown will be converted to MS Word and, secondly, MS Word is a whole country compared to the slender, but still small town, Markdown. There is a huge amount of everything that is in Word and in no form is in Markdown. You can’t just take your Markdown format with the desired Pandoc keys into the desired form of MS Word. So usually, after conversion, you have to “refine” the received .docx document manually, which again can be time-consuming and lead to errors.

    If we could write a script that would automatically “finish” what Pandoc couldn’t do, it would be an ideal solution.

    Due to the fact that MS Word and Markdown functionality is not identical in general terms, it is impossible to solve this problem, I think, but can this be done in relation to specific situations, specific requirements? My experience has shown that yes, it is possible and most likely it is possible for many, or maybe even most situations.

    Solution of a particular problem

    So, in my case, after converting the file using Pandoc, I had to manually do additional file processing, namely

    • add fields in Word with automatic numbering of captions of tables and pictures
    • change the style for tables

    I did not find how to do this using standard (Pandoc) or known means. Therefore, I applied a python script with the pywin32 package. As a result, I got full automation. Now I can convert my Markdown file to the desired form of MS Word document with one command.

    See details here .


    In this example, I, of course, will transform some abstract Markdown file, but exactly the same approach was applied to the “combat” document, and at the output I received almost exactly the same MS Word document that we had previously received by manual formatting.

    In general, with pywin32 we get almost complete control over the MS Word document, which allows us to change it and bring it to the look that your corporate standard requires. Of course, the same goals could be achieved using other tools, for example, VBA macros, but it was more convenient for me to use python.

    A brief formula for this approach:

    Markdown + Git -- (нечто) --> MS Word

    Not so important what “something” is. In my case, it was Pandoc and python with pywin32. You may have different preferences, but the important thing is that this is possible. And this is the main message of this article.

    In summary, the idea is that with this approach you only work with the Markdown file and use Git to organize collaboration and version control, and only if necessary (for example, to provide documentation to the client) automatically create the file of the desired format (for example, MS Word )


    I think for many the formula given above is enough to understand how the process of working with documentation can be organized now. But nevertheless, I usually focus on network engineers, so I’ll outline in general how the process can now look and how it differs from the approach to editing MS Word files.

    For definiteness, we will choose GitHub as the platform for working with Git. Then you must create a repository and put the Markdown file or files you plan to work with in the master branch.

    We will look at a simple github flow based process. Its description can be found both on the Internet and on the Habr .

    Suppose that four people are working on the documentation and you are one of them. Then four additional branches are created, for example, with the names of these people. Each works locally, in its own branch and makes changes with all the necessary git commands .

    Having completed some finished piece of work, you form a pull request, thus initiating a discussion of your changes. Perhaps in the discussion process, it turns out that you should add or change something else. In this case, you make the necessary changes and create an additional pull request. In the end, your changes are accepted and merged with the master branch (or rejected).

    Of course, this is a pretty general description. I suggest contacting your developers or finding knowledgeable people to create a detailed process. But I want to note that the threshold for entering Git is quite low. This does not mean that the protocol is simple, but you can start with a simple one. If you don’t know anything at all, I think, having spent several hours or maybe days to study and install, you can start using it.

    What is the use of this approach in comparison, for example, with the process described in the example above?

    Actually, the processes are pretty similar, you just replaced the

    file copy -> create a branch,
    copy the text to the final file -> merge
    copy the latest changes to yourself -> git pull / fetch
    discussion in correspondence -> pull requests
    track mode -> git diff
    latest approved version -> master
    backup branch (copying to a remote server) -> git push

    So you automated everything that you already had to do, but manually.

    At a higher level, this allows you

    • Create a clear, simple, and controlled process for changing documentation.
    • because the final document (in our example MS Word) you create automatically, this reduces the likelihood of errors related to formatting


    In view of the above, I think it’s obvious that even if you are working on documentation alone, using Git can greatly facilitate your work

    All this improves the quality of documentation and reduces the time for its creation. And a little bonus - you learn Git, which will help you in automating your network :)

    How to switch to a new process?

    At the beginning of the article, I wrote that tomorrow you can start working in a new way. How to translate your work in a new direction?

    Here is the sequence of steps that you most likely will have to complete:

    • if your document is very large, break it into pieces
    • convert each part to Markdown (using Pandoc, for example)
    • install one of the Markdown editors (I use Typora )
    • most likely you will have to tweak the formatting of the created Markdown documents
    • start applying the process described in the previous chapter
    • parallel, start modifying the conversion script to fit your task (or create something of your own)

    You do not have to wait until you create and debug perfectly the Markdwon -> conversion mechanism required for the output document type. The fact is that even if you can’t quickly quickly completely automate the conversion of your Markdown files, you can still do it in any form using Pandoc and then manually bring it to its final form. Usually you don’t need to do this often, but only at the end of certain stages, and this manual work, although inconvenient, is still, in my opinion, quite acceptable at the debugging stage and should not slow down the process much.

    Everything else (Markdown, Git, Pandoc, Typora) is ready and does not require special efforts or time in order to start working with them.

    Also popular now: