Ease of Being: Antiword, reST

    When preparing documents, how to avoid slow office suites, use your favorite text editor, separate content from presentation, ensure high readability and transparency of documents for VCS, and easily compare text versions?

    Recently he commented on the release announcement of the new LibreOffice and decided that considerations should be drawn up more coherently.

    Imagine that we received an e-mail document in MS Word format, we must fill it in / correct and print / forward. Most likely, this document will be periodically sent to us again and / or we will need to continue to prepare updated texts on its basis.

    Problem # 1: Naturally, MS Office is missing on our GNU / Linux, and OpenOffice and its heirs are terribly slow (especially if we are used to Vim and other lightweight programs).

    Problem No. 2: very often the layout of incoming files is catastrophic (beating with spaces, formatting without styles, etc.), so you have to practically redo the entire document to make normal changes.

    Problem 3: MS Word format is opaque, and OpenDocument conditionally transparent. In other words, even an open format cannot be easily read by simple means: you need to unzip a bunch of files and parse XML. This means that for version control, such documents are not transparent.

    What to do? Unix Way comes to the rescue in the form of simple programs working with simple text.

    Instruments

    • Antiword - utility for extracting text from MS Word format;
    • reStructuredText (reST) - a very simple and quite powerful language for semantic markup of text;
    • Docutils tools (rst2latex, rst2html, rst2odt, rst2xml) and rst2pdf - utilities for exporting text from reST to common formats for typesetting, web and printing;
    • Bonus: rst2a (online converter with API!)

    Workflow

    1. antiword reads .doc and prints plain text;
    2. edit simple text;
    3. rst2 * utilities convert text from reST markup to arbitrary formats.
    For example, we received a document in MSWord format, we want to quickly fix something in it and save a text / template for the future:

    $ antiword estimate.doc> estimate.txt
    $ vim estimate.txt
    $ rst2pdf estimate.txt -o estimate.pdf

    Done, beautiful PDF can be viewed and printed. By the way, it’s convenient to keep the PDF open, say, in Okular during editing the source . When exporting from reST to PDF (and this can be done automatically) Okular will immediately update the content without resetting the open page. It turns out almost instant preview. And there (in Okular) you can print a document.

    I usually add another stylesheet (the same one is more or less suitable for all documents, you can expand it for a specific document). Styles for rst2pdf are written in JSON (see documentation).

    results


    Problem number 1 resolved: cross-platform, lightweight, fast and unobtrusive tools are used.

    Problem No. 2 is resolved: the nightmare source layout is immediately killed, instead of it we get the text itself, which can easily be reST-ordered. If necessary, the result can be brought to LaTeX.

    Problem number 3 is resolved: all documents (and styles) are completely transparent to VCS and can be read without special tools. Exactly like program code. So, if something has changed in the official document, you will always have readable diffs for any dates. For inboxes, you can also store the original versions (preferably antiword output), to diffuse them and easily transfer only changes to the correct reST files.

    Notes

    • The proposed utility stack will not plug all the holes. The beauty of Unix-vey is that you can safely replace components.
    • By the way, Antiword “is able to convert Word documents to plain text, to PostScript, to PDF and to XML / DocBook”, so in some cases reST can even be avoided.
    • This note is written in reST and exported via rst2html. ;-)
    • UPD: thanks ingspree for the amendment: not reST, but reST correctly :)

    Also popular now: