This day we brought closer as we could - notepad in Windows 10 began to understand the Unix line feed

    Notepad in windows 10 began to understand the Unix line feed , not just the Windows format.

    The problem of “porridge”, instead of readable text, has been faced for decades by those who tried to open text documents prepared on other operating systems in the Windows environment. Now everything is changing overnight. And this change is as small as it is epic in its practical results and ideological consequences. Microsoft is once again trying to play cross-integration and support for open standards.

    For many years, Windows Notepad could normally display only those text documents that contained the beginning of a new line in Windows End of Line (EOL) format - “carriage return” (CR) and “feed per line” (LF). In fact, this led to the fact that Notepad was unable to correctly display the contents of text files created in Unix, Linux, and macOS, where only the LF character was used as a line terminator.

    For example, here is a screenshot of Notepad, trying to display the contents of a text file Linux .bashrc, which contains only Unix LF EOL characters:

    image

    And here is a screenshot of the recently updated "Notepad", displays the contents of the UNIX / Linux .bashrc file itself, but with the correct hyphenation:

    image
    Please note that the status bar indicates the detected EOL format of the currently open file.


    Also, for flexible management of a new feature, two additional keys are entered in the [HKEY_CURRENT_USER \ Software \ Microsoft \ Notepad] registry key:

    image

    In terms of passion, the debate about how to start a new line in electronic documents is comparable to the dispute about spaces and tabs in the source code of programs. This confrontation “per line” had many reasons , both lying in the field of ancient standards and traditions, and taking their roots in the design features of printing machines and teletypes. An equally important role was played by the desire of some programmers to literally execute (interpret) commands and control characters, and others to follow common sense.

    What can we learn about the problem from Wikipedia



    Historically, mechanical typewriters had a lever that brought the carriage back to the left edge of the page and scrolled the shaft, moving the paper up a line. On teletypes and later alphanumeric printing devices (ADCS), instead of a carriage, there was a head, in laser printers it ceased to be material, but in the term carriage return, they continued to call it carriage in order not to change it. On teletypes, carriage returns and line feeds were shared, from where the tradition of representing line feeds as CR + LF moved to text files.

    Systems based on ASCII or a compatible character set use either LF (line feed, 0x0A), or CR (carriage return, 0x0D) individually, or the sequence CR + LF. These names are based on printer commands: a line feed means that one line on paper should be wrapped when printing, and a carriage return means that the carriage of the printing device should return to the beginning of the current line.

    • CR (ASCII 0x0D) was used in Commodore 8-bit machines, TRS-80, Apple II, Mac OS systems up to version 9 and OS-9;
    • LF (ASCII 0x0A) is used in Multics, UNIX, UNIX-like operating systems (GNU / Linux, AIX, Xenix, Mac OS X, FreeBSD), BeOS, Amiga UNIX, RISC OS and others;
    • CR + LF (ASCII 0x0D 0x0A) is used in DEC RT-11 and most other early non-UNIX and non-IBM systems, as well as in CP / M, MP / M, MS-DOS, OS / 2, Microsoft Windows , Symbian OS, Internet protocols.


    By standard, any Unicode-compatible application should be interpreted as a line feed for each of the following characters:
    • LF (U + 000A): English line feed - feed the line <S>;
    • CR (U + 000D): Eng. carriage return - carriage return <VK>;
    • NEL (U + 0085): English next line - go to the next line;
    • LS (U + 2028): English line separator - line separator;
    • PS (U + 2029): English. paragraph separator - paragraph separator.

    Moreover, the sequence CR + LF (U + 000D U + 000A) should be taken as one line feed, not two.

    But as you know, standards are standards, and implementations for all often come out different. And fuel is added to the fire by the need to correctly display inherited documents created before the Unicode era. The lack of a single generally accepted representation of line feeds in different operating systems for a long time complicated the exchange of text data between them.

    Unicode tries to reconcile this difference by equalizing CR, LF, and CR + LF, however, it conflicts with its ASCII inheritance when interpreting the LF + CR sequence not preceded by CR: according to ASCII, this is one line feed, and according to Unicode, two.

    Also popular now: