Notepad ++. Cyrillic characters mistakenly in the code - solution to the problem

Yesterday I spent almost two hours finding the error in the seemingly correct code. The problem turned out to be commonplace - the Cyrillic letter “e” somehow got into the key of the “text” array. In appearance, it does not differ from the “e” Latin, and it was very difficult to find the problem. I am sure that most programmers, and just people who work with textual information, from time to time encounter similar troubles. This is especially true of the English letter "s" and the Russian "es", which are on the same key in the Russian and English layouts. This case is far from the first for me, and therefore I decided to start looking closely for a solution to this problem. And a solution - albeit not very elegant, but quite workable - was found.

Historically, I often use Notepad ++ to work in general, and to write scripts in PHP in particular. And in it, for example, the variable names $ iicuxiphametod and $ іiсuхiрhametоd (ignore the strange names - this is just an example) look exactly the same, although half of the characters in the word on the right are Cyrillic.

image

My first thought was to use a regular expression search to find all lowercase Cyrillic characters that are immediately to the right or left of the Latin character and manually, or, again, by regular expression, replace them.

Search example (template (? <= [A-Za-z]) [а-яёі] | [а-яёі] (? = [A-Za-z]), in the character classes "i" Ukrainian):

image

Search result :

image

For simplicity, I did not choose only Cyrillic characters that are similar to Latin into symbol classes, but included them all (Russian and Ukrainian alphabets, with the exception of some Ukrainian letters) - I just wanted to show the principle itself.

Alternatively, such a solution can be considered, but then each file will have to be checked every time the code does not work like that. And this is not convenient.

My second thought was: “Is it possible to set a separate font or a font of a separate size for the Cyrillic alphabet?so that the Cyrillic and Latin characters differ in appearance already at input, erroneously entered characters would be striking and could they be fixed right away, and not later? ”In Notepad ++ there was no such option. You can set individual fonts, sizes, colors for different programming languages, for different types of data - variables, lines, reserved words, etc., but not for Cyrillic.

Then I thought that perhaps there is a plugin that allows this to be implemented. But the search for such a supplement did not bring success either.

And then I had a bright thought - I need to find a font in which the Cyrillic alphabet will differ from the Latin alphabet, and set it for service words, variables, and some other problematic categories. And such fonts, albeit with exotic names, were found (although it should be noted that not so many such fonts were found).

So, for example, the above names look if you set the SimSun-ExtB font for variable names (Options-> Define styles-> Font style):

image

More examples :

Font MingLiU-ExtB:

image

Font NSimSun:

image

If you go even further, then for string data, you can specify a font in which Cyrillic characters differ from Latin, for example, SimSun-ExtB, and for some others, for example, for variables where Cyrillic is not needed under normal conditions - a font in which there is no Cyrillic, for example, the font Miriam Fixed. Instead of Russian letters, other characters are displayed in such fonts and are immediately evident.

image

Compare the same names in the Courier New font:

image

and in the Miriam Fixed font: The

image

fonts are very similar, but in the second case the erroneous input of the Cyrillic character is practically excluded.

This solution works for Notepad ++, but I think the same thing can be done in some other editors and IDEs.

I hope this method helps someone save their time and prevent these elementary, but such unpleasant mistakes in the future.

Also popular now: