Sir Markdown Yandex lecture

    When developing documentation, we are guided not only by standards, but also by the convenience of its use. Standards determine the composition and form of documentation, and the format is built on the basis of convenience. Developer Sergey Bocharov talks about the path of the Markdown document and the problems that have to be solved in exchange for the ease of use of this format.


    I sometimes get the impression that he does not serve us, but we serve for this format. Therefore - Sir Markdown.

    - We know what is now on the scales
    and what is happening now.
    The hour of courage has struck on our watches,
    And courage will not leave us.

    It’s not scary to lie under the dead bullets,
    It’s not bitter to stay homeless.
    And we will save you, Russian speech, the
    Great Russian word.

    We’ll carry you free and pure,
    And we will give it to our grandchildren, and save from captivity -
    Forever.

    Good evening, friends, my name is Sergey Bocharov. I began with the poem by Anna Akhmatova, “Courage”. It was written in 1942, during the Great Patriotic War, and even then the poetess understood that it would be much easier to restore and build factories destroyed by bombing than to return spiritual wealth, wasted and actually trampled over the years of the war.

    Poetry is one of the few means that helps us learn to feel the beautiful. Today we are talking about technical documentation - it would seem that such a far field is from poetry, but maintaining the beautiful, maintaining a single style is very important, especially when working with this format. Markdown has become the de facto standard for writing technical documentation in the open source world. And united people of different specialties.

    I have a difficult and at the same time pleasant mission. I want you to have the full feeling of assembling Markdown documentation in Yandex, and also share the tools that we developed. I hope you will also use them in your projects.

    The tools that are developed are written in JS because at Yandex we love JavaScript. And all development is conducted around this language. There is no need to be upset if you have a different development environment or a hype React - I think you can find analogues on GitHub.

    Probably many are wondering, why did Markdown become sir? This is a metaphor, and it is related to the fact that we at Yandex try to robotize any process.

    Markdown is also developing a large number of tools. To date, we have already developed a lot of tools, we continue to develop. I sometimes get the impression that he does not serve us, but we serve for this format. Therefore - Sir Markdown.



    In the first global part, we will talk about why we are writing in this format, why we are developing some tools for it. In the second part, we will talk in more detail on the example of our library on how to maintain the quality of content, how to translate it, and how to build one site from many repositories.

    Markdown was created in 2004 by John Gruber and Aaron Schwartz. The idea was to have simple text syntax and then convert it to richer and more valid HTML.



    We have a heading of the first level, second level and some paragraph of the text.

    Why a new format when there is DITA with richer tools? Why create new tools for Markdown? Let's try to answer together.



    DITA has a more complex syntax, and it is advisable to have a specific development environment to work with it. It is clear that this is an XML format, it also opens in a text editor. But SVG also opens in it, while no one draws there - everyone uses Photoshop or Sketch.

    Markdown, on the contrary, has a lighter syntax, which is why many developers liked it so much. As a result, the documentation in Markdown is written and maintained by a technical writer with the active participation of contributors and developers, and the documentation in DITA is often developed only by a technical writer, developers and contributors do not take an active part.

    A vivid example of a site with Markdown documentation is the npm site, today it contains 475 thousand modules, and every day there are more and more of them.


    Here are the most popular ones. If you go to anyone’s site, for example Gulp, and go to the documentation section, we immediately get to GitHub, where we see that the gulp.js API is described in Markdown.



    Therefore, if for some reason you are not yet using Markdown, or bypass it, please use and make your developers happy.

    Style and syntax. I propose to consider the example of our internal Lego library, a top-secret. Now I will demonstrate.



    Unexpectedly, right? All of these blocks are different. Here is the logo block, teaser, etc. And they are stored on GitHub, the de facto standard.

    Here is a general description of the library, there are also block directories, and each directory has a description of this block. We consider the documentation as part of the code, so we call it by the appropriate name. It is also convenient in case of replacement, search & replace. Once upon a time, a technical writer worked on each document. The translator also worked on the English versions, and ideally, the documentation, the Russian and English versions, should be consistent, that is, the structure and content should be the same.

    The documentation is also actively worked by the developers themselves, there are a lot of them. The process that we are trying to build in the company is as follows: the developer, having developed new functionality or a new block, sets the task for the technical writer in the form of a pool request or issue.

    The technical writer describes this functionality, and then gives it for translation if a language version of the document is needed. And everyone is happy, but it is a perfect world. But in the real world the situation is often as follows: the developer himself comes and makes corrections to the documentation.

    Here we are faced with the first problem - the loss of consistency. The next problem is also changing the style of writing documentation.

    It seems - think, the main thing is that the functionality is described. It turns out, no. After the document was written by a technical writer, the developers were happy.

    Then, when several dozen developers came there with their commits, they were already upset and eventually burst into tears. They say - you need to rewrite the document, it has become incomprehensible, it is impossible to read, there are a lot of different incomprehensible constructions, carrots, markers of the problematic text.

    With them you need to somehow be able to fight. Technical writers know and are able to deal with them, and developers often allow them in the documentation, and such documentation is uncomfortable to read.

    For example, is everyone comfortable here? Everyone understands what it is about? Obviously, this is about git, and this is found in our documentation. Here is a more understandable option.

    Developers who have little experience with GitHub sometimes encounter difficulties when reading documentation written by development gurus. Therefore, we add the following problem - the preservation of style and terminology. The developers commit a lot, and the technical writer is almost invisible, the unity of style is broken.



    The next problem with this approach is that our syntax is broken. Markdown allows you to write in completely different ways and get the same result. The developers of technical documentation for each case have an agreement on how we write headings, lists, insert screenshots, etc.

    Indeed, reality often differs from what is desired, and one must also be able to deal with this problem. It would seem that the result is expected, but if this problem is not solved now, it will have to be solved at the assembly stage. Often there is a task - for example, to find all the headings of the third level withbektikami and increase by two pixels. If we do not solve it at the lining stage , then we will have to solve it at the assembly stage, write large scripts.

    Therefore, we add the following problem - Markdown syntax. We got three main challenges that we are struggling with. We also have open source projects, in particular BEM. Open source projects, in addition to developers, technical writers and translators, also have contributors. Contributors help make our products better, for which we are grateful. There are a lot of them. They send us their pool requests, and we share quality content with them. Therefore, we definitely need to somehow look for solutions.

    The next section is devoted to auto-testing, linting. What can be done to somehow learn how to consistently check Markdown syntax, find grammatical errors and markers of problem text. This is my favorite section. I think the linguistic progression works for the technical writer’s progression.

    Let's start with a tool called remark-lint. It allows you to check the syntax and style of writing. Remark itself is in the public domain, it was not developed by us, we use it, it has its own set of rules, there are more than 50 of them. On top of these rules we wrote our own rules and included our guide in remark.



    How it works? Suppose there is a test file with content, there is a heading of the first level, the second and some list.



    We enter the command in the terminal. This is a team we are processing, showing. When a technical writer commits on GitHub and the document is in order, a message is displayed that there are no errors. And the commit goes to GitHub. Suppose we have errors - for example, we will make it the second level in the first heading, and add “Hello” and an exclamation mark in the second heading. We execute the same command, and we have three errors.



    The progression of linting works on the progression of a technical writer. The technical writer recalls that we agreed not to put exclamation points in the headings, rules, everything is fine. How are these rules connected? At the root of the project is the remarkrc file, in it we define a set of our own rules (I reduced them) and a set of borrowed rules of the remark itself.

    The next tool is yaspeller. It checks for grammar and spelling errors in the documentation. The documentation is on Yandex.Technologies - by the way, it is written in git. You can read it, everything works there according to the same principle: there is a spelling error - a message is displayed. Contributors, developers who are trying to pool-request you, send some corrections, they will not be able to send them with spelling errors or inaccuracies in Markdown syntax. So these tools are very convenient to connect, and they work on prekommit.

    The next section is about translation. We have developed the md2xliff tool. We translate a lot of open source documentation and a bit of internal. In the case of open source documentation, we have contributors who send their pool requests, and to make it easier for them to send them, we make dice for them on our website in which we offer them to follow the link either through the GitHub interface or using the prose service. io. For example, they come in, make changes, click OK, and a pool request arrives to us.

    How to support all this? Suppose a document was written by a technical writer, a translator translated it, a user came - initially in the Russian-language version - and corrected something there. What to do with the English version? Do you need to edit something there or not? Unclear. How to look for a typo that has been fixed is also unclear. Or you can go to GitHub and see the difference there in diff. But this is still a task, you need to put it to the translator. It is necessary to seek a solution.

    There is a second situation. For example, the developer wrote the second version of the library, and did not take the entire document, did not rewrite 30 pages, and then deleted the piece, added it. And if deleted - it is not clear what to do. We have to go and somehow verify this in diff on GitHub.

    How to be This seems to be a difficult situation in which there is no way out. However, if one of you has worked with translations, he probably knows that there are a lot of standards, and upon closer examination, the solution looks something like this: there is a test file and some kind of documentation text that lies on GitHub. What should be done? You need to generate two files from it, a skeleton and XLIFF translation.



    The skeleton is a block formatting, that is, we replace pieces of text with such placeholders with numbers.



    XLIFF is a special format, it is described, it has a specification, everything is simple there. Most importantly, there are units in XLIFF, and the unit id corresponds to the segment that we replaced in the skeleton.



    Also in each unit there are two tags: source and target. The source tag contains exactly that piece of text that we replaced in the skeleton, and the target field is initially empty. We give this XLIFF to the translator. Now the target field is filled out. After the translation, we do the reverse generation and get the English version of the document.



    At the same time, the translation does not disappear anywhere, but is saved in a special standardized XML-file TMX. There are two values: source and target. How does this help us? We return to the previous situation. Contributors, developers, or another technical writer came in and corrected something in the original document. In the Russian version, for example.

    We still generate XLIFF, give it to the translator, it uses the database that it has stored in the program and translates exactly those segments that have changed. It does not translate lines that have a one hundred percent match - they themselves are replaced by AutoCorrect. Thus, there is no longer a problem to look for what has changed. We guarantee that all lines that have been changed at least somehow will be visible in the translation. Next, we generate the English version of the document, everything is simple. It seems that there is a ready-made solution - simply because for sure they should be.



    There is smartcat.com from ABBYY, there is a solution from Google and there is Matecat. But the total flaws of these solutions are that they do not support Markdown, which does not have a unified standard on how to write. And they bypass it, support any standardized formats. Last week I checked Markdown in matecat, everything turns red there. Although Markdown was unpretentious.

    Take, for example, our tool with complex nesting. If you have a code, there are three backics inside it, and there is JSDoc there, it copes with everything by 99%, the level of nesting can be any.

    The second fatal flaw of these services is that they do not integrate with GitHub. We want the user to come to us through the link and correct something, but they will not integrate.

    We all discussed this. When there is a source document in Russian, and we translate into another language, we have a certain couple, a rigid attachment to the Russian language. We are working to get rid of this attachment so that we can edit TMX on the fly, no matter where the user comes. It can come in the Russian version, or it can in the English one, and we must deploy TMX directly during generation. This has not yet been decided.

    I propose to consider the assembly of the site as part of a general review of the path of the Markdown document from the moment of writing to the moment of putting it on the site.



    What does the workflow look like? Suppose a work plan is drawn up, all the information is collected. Speaking of Markdown, it’s important to follow a syntax arrangement. After which the auto-test takes place, it works on pre-commit our lintings. Next, the document goes to GitHub. If you need an English version, we localize the document. After the assembly takes place, and there are two stories. One is when a one-to-one document is mapped onto a page, and the second is when you need to build various inline examples. You need to embed an iframe in the page, etc. We have a tool that can do all this, a pot. He knows how to replace links, combine different Markdown documents into one, and knows how to build inline examples. Then there is a calculation on the site.

    Why do I need a website? Why, as on gulp.js, not store all the documentation in Markdown?

    The answer is obvious - you need a single entry point. We have more than a hundred repositories, and we want these documents to be collected in one place. Search, navigation and live examples are also needed. Live examples look like this.



    The same document on GitHub and on the site is rendered differently. We can open it in a new window, click on the button, see its HTML. It is very comfortable.



    What are our recipes? What do you do? First of all - to determine the needs. If they are similar to ours, then introduce restrictions on Markdown syntax, follow terminology, do automatic checks and use Translation Memory. Tools: remark-lint , yaspeller , md2xliff . Thank you

    Also popular now: