torrie April 15, 2010 at 09:32

Recognize text using Hamming distance

This article was prompted by an article by Alex Povetkin - “ Pattern Recognition by the Method of Potential Functions ”

So, we are going to write a program in Delphi (I use version 6), which can translate characters from a picture into text. The task is quite popular on the Internet, and for each post “ I want to implement character recognition !!! Help "the most common answers" read on the Internet "or" do not handle, use a file reader "and the like.

I, like many others, began by studying the basic algorithms. Of course, monsters such as FineReader spend a lot of money on the algorithmic component, and we don’t know their secrets, but a decent amount of other information was found to understand the basic methods. But let's start from afar.

The essence of recognition is comparison with the image.

In childhood, looking at the text, we see only a picture. We have not yet been endowed with the necessary criteria for its analysis in order to understand that the picture is not just a picture, but a text, and not just a text, but some kind of story. We grow, and begin to receive images of symbols, we find out that the picture with the letter "A" means the letter A, and not the steamer. As we get older, we are gaining a base of images (or a base of image criteria), according to which, by comparison and analysis, we can tell which symbol is in front of us.
Exactly the same situation with the computer. First, he simply receives a picture in his memory, then searches for some areas on it, and then compares these areas with his base and conducts an analysis, the result of which will be the determination of the symbol in the image. In words, everything is simple - we transfer the image to the memory, look for some areas in the image, compare it with the database, but the methods for comparing the image are still not clear. Go to the Internet.

Comparison methods

Modern methods of image comparison are divided into two categories:

1. Comparison with a standard
2. Comparison by criteria

And the neural network method stands apart between them, since it is applicable in both cases. We will not use it and consider its work, we will try to do everything as simple and clear as possible.

Comparison by criteria is the most time-consuming, because it is difficult to perform the algorithmic task of determining the characteristic features of a symbol, and the effectiveness of such an algorithm will probably be small. This is due to the similarity of characters, and if we recognize low-resolution text, then the cost of honing such an algorithm will be several times greater. However, for such a difficult task, a source with a solution was found .

Sources of simple programs were also found that check the extreme points and bend points for the presence of a black area:
Recognition of postal codes
Recognition of Yandex (CY) images Recognition

problems of these programs are evident - if the images do not match (as well as the presence of noise, foreign objects, etc.) these programs are ineffective. However, they cope with a narrow circle of tasks assigned to them. We want to get a more universal example of recognition implementation.

Comparison with the standard is a simpler and more efficient task. Here everything is as simple as possible - create a reference image and compare them. There are a lot of comparison methods - pixel by pixel, overlay, overlay overlay, and others. The Internet is full of articles on this topic.The most interesting works:
Image recognition by a mobile robot.
The use of neural networks in image recognition.
Implementation of the simplest algorithm for recognizing graphic images.

We will use the simplest comparison method - pixel-by-pixel . The bottom line is to alternately compare two corresponding pixels of two black and white pictures. Since the two pixels must be appropriate, the pictures must be the same in size.

Comparison is a very long algorithm, so we need to optimize the work of our program to compare not two pictures, say 100x100 pixels (10,000 comparison operations), but something smaller, but still relevant (comparing pictures of 8x8 pixels, distinguishing a letter in them is quite difficult even to a person, therefore, the more pixels we take, the greater the probability of correct recognition and the more time our algorithm requires). Experimentally, the optimal size was chosen - 16x16 pixels. But before resizing, you still need to find what we will change.

Selection

see function TForm1.PreParse1 (Pic: TPicture): string;
So, we need to find symbols in the picture, and find them all. To do this, the picture is converted to black and white ( see procedure TForm1.Mono (Bmp: TBitmap) ), and all objects are searched through comparison with the extreme pixels of the hypothetical character. A red frame is built around each character found, it is on this frame that the image is resized (StretchBlt function) and copied for comparison ( see function TForm1.Parce3 (Pic: TPicture; x1, y1, x2, y2: Integer): TBitmap ; )

Comparison algorithm

See function TForm1.ParceBMP (bmp1: Tbitmap): string;
The algorithm is simple - we compare pixel by pixel ( see function TForm1.Compare (b1, b2: TBitmap): integer ), if two pixels are different (one is black, the other is white), then increase the counter R. After we completely compare the two pictures, we substitute the counter in the formula

Value , obtained as a result of comparison, was called " Hamming distance ."
Then we form an array of results of calculations using this formula for different comparisons (our picture with all the standards). We take the maximum value of the array and the corresponding character. This is our result!

Drawing up the base

The next step in the work of our program is to compile a database (an array of symbol records and its graphical representation). We will make a standard database of images of Russian characters and numbers without the use of graphic editors - in the source code. To do this, you need to set the image of the desired size, open the canvas mode, and write the necessary letter in the image.
Immediately it is necessary to process the resulting image in the same way that we process the original one, so that the standard is as close as possible to the recognized image and vice versa (read about it below).

Var

 myarray:array of record //задаем наш массив

 ch:char; //символ

 bmp:TBitmap; //соответствующее ему изображение

 end;

 i : integer; //счетчик

 Img:TBitmap; //промежуточное(буферное) изображение

 im:TPicture; //то же изображение, для передачи на обработку

begin

 SetLength(myarray, 73);//выделяем массиву память

 im:=TPicture.Create; //создаем необходимые объекты

 Img:=TBitmap.Create; 

 for i := 0 to 31 do

 with Img.Canvas do //открываем канву

 begin

 Img.Height:=16; //задаем ширину и высоту

 Img.Width:=16;

 Brush.Color := clWhite; //чертим белый фон

 Pen.Color := clWhite;

 Rectangle(0,0,16,16); //используя инструмент квадрат

 Pen.Color := clBlack; //ставим ручке черную краску

 Font.Color := clBlack;

 Font.Charset:=form1.FontDialog1.Font.Charset; //выбираем шрифт (пользователь выбирает шрифт для базы) и его параметры

 Font.Size := Form1.FontDialog1.Font.Size;

 Font.Style := Form1.FontDialog1.Font.Style;

 Font.Name := Form1.FontDialog1.Font.Name;

 TextOut(0, 0, CHR(ORD('А')+i));//пишем в нашем изображении

 myarray[i].bmp:=TBitmap.Create;//создаем объект в массиве

 myarray[i].ch:=CHR(ORD('А')+i);//задаем что это за символ соответственно

 im.Bitmap.Assign(Img);//задаем картинке изображение

 myarray[i].bmp:=PreParse2(Im);//посылаем на обработку и присваиваем выходную картинку элементу массива(описание будет немного позже

 end;

Mistakes

Where without them? We will also need a method for handling recognition errors. To do this, before recognition, we must make a special button that will show the characters and the result of the program, and ask us if the recognition is correct. If yes, then do nothing, otherwise enter it into the database, the structure of which is the following - an indexer file in which the number of characters in the database is specified, the next lines are the character, and a link to the image file ( see procedure TForm1.Learn (Pic: TPicture); ).

Brief block diagram

Implementation

We define the tasks and draw the form:

The resulting source is here .

Screen running program:

What is missing or plans for the future

- The program cannot recognize letters consisting of two figures: Y, Y.
- When turning / noise / other recognition is not correct.
- Sometimes it recognizes small letters as large, large as small (which is actually logical for this method)

Conclusion

So the next "recognizer" of the text is written. Who will use it? Virtually no one. This is not FineReader, nor Cuneiform. Here, everything is much simpler and more understandable, and less effective. But despite this I hope this article will be useful to the IT community. This system can be changed for any other needs.

In the near future I want to supplement the story with graphs of recognition results, as well as comparing this recognition method with others.

During the writing, the following materials were used:
Image recognition by the method of potential functions
Postal code recognition (image selection algorithm was taken)

UPD: there were problems with the database, I decided the source was reloaded, thanks AmoN '

Tags: