How Evernote Finds Text in Images


    Searching for text in images is a fairly popular feature of Evernote. We would like to tell how it works and answer some common questions.

    How is image processing

    When a note is sent to Evernote during synchronization, any resources included in this note, with the PNG, JPG or GIF MIME types, are sent to a separate set of servers that are busy exclusively with recognizing text in the attached images and returning the results found. These results are added to the note as a hidden (or rather not visible when viewing the note) metadata set called recoIndex . The whole recoIndex can be viewed if you export the note to an ENEX file.

    For example, here is an old post in one of our accounts with a snapshot of a beer bottle.


    If you export this note as an ENEX file — the portable XML export format for Evernote notes — and go to the end of the file, you can see the recoIndex section . In recoIndex contains several elements item . Each such element corresponds to a rectangle within which the Evernote recognition system suspects the presence of text.

    Each element of item contains four attributes: x and y , corresponding to the coordinates of the upper left corner of the rectangle, and w and h , meaning its width and height.

    When evaluating an image for textual content, a set of possible matches is created in the form of child elements associated with one or another item . Each match is assigned a weight (attribute w ), a numerical value that reflects the probability, the proposed text corresponds to the real one in the image.

    Recognition results are embedded in the note and subsequently synchronized with the client applications of the user. From this moment on, the text found in the image becomes searchable.

    Here is the recoIndex in the note shown above.

    <?xml version="1.0"?><recoIndexdocType="handwritten"objType="image"objID="a284273e482578224145f2560b67bf44"engineVersion=""recoType="service"lang="en"objWidth="1936"objHeight="2592"><itemx="853"y="1278"w="14"h="17"><tw="31">II</t><tw="31">11</t><tw="31">ll</t><tw="31">Il</t></item><itemx="501"y="635"w="770"h="254"><tw="32">LONG</t><tw="25">LONG</t><tw="23">GOV</t><tw="23">NOV</t><tw="19">Lang</t><tw="18">lane</t><tw="18">CONN</t><tw="17">bono</t><tw="17">mono</t><tw="15">LONON</t><tw="15">LONGE</t></item><itemx="757"y="2001"w="337"h="63"><tw="23">APONTE</t><tw="23">HAWK</t><tw="19">Armas</t><tw="19">Haas</t><tw="18">AMBRY</t><tw="18">AMATO</t><tw="18">AMMO</t><tw="18">areas</t><tw="18">rads</t><tw="18">pads</t><tw="18">lads</t><tw="18">fads</t><tw="18">dads</t><tw="18">cads</t><tw="18">wads</t><tw="18">WONKS</t><tw="18">AMULET</t></item><itemx="0"y="1284"w="128"h="730"><tw="18">TWELFTH</t></item><itemx="184"y="1274"w="72"h="172"><tw="32">MIL</t><tw="31">MIT</t><tw="30">lot</t><tw="28">It***</t><tw="25">It**»</t><tw="25">Ii**»</t><tw="25">WHO</t><tw="21">oil</t><tw="20">LOL</t><tw="19">TWX</t><tw="19">TWI</t><tw="19">mill</t><tw="19">Ott</t><tw="19">list</t><tw="19">LWT</t><tw="19">Mtl.</t><tw="19">limit</t><tw="19">tfiii</t><tw="19">tot</t><tw="18">Tim</t><tw="18">pol</t><tw="18">NHL</t><tw="17">this</t><tw="15">MID</t></item><!-- ... --></recoIndex>

    The recoIndex set contains the item and t elements (matches). As you can see, most item elements have several t elements , each of which has a specific weight. When the user begins to search for text in the Evernote client, the search also occurs by the content of t elements :


    How does PDF processing work?

    Evernote's text recognition system also works with PDF files, but their processing is different from images. In this case, an additional PDF document is created containing the recognized text. This document is added to the note with the original PDF. It is not visible to the user and is present only for search needs. It is also not taken into account in the monthly available space for downloading user data.

    For PDF to be recognizable, it must meet two criteria:
    • contain a bitmap
    • not have highlighted text (or at least have a minimal amount of it).

    In practice, many PDFs generated from text formats by other applications, such as text editors, do not satisfy these requirements. PDF files created by scanners typically meet the above criteria. However, if the scanner software performs its own text recognition in the resulting PDF, the file will also be excluded from processing by our service.

    If you export a note from a PDF that went through text recognition, you will notice two nodes in the document: data and alternate-data . The data contains the version of the original PDF encoded in 64-bit code, and the alternative- data contains the searchable version.


    What text can be recognized?

    Almost any, the main thing is that the recognition system understands that this is text. Both typewritten (for example, street signs or posters), and handwritten (even if you do not have the most accurate handwriting in the world), the text will be detected by the system.

    What can affect recognition is the orientation of the text. The text found in the image is evaluated for matches in three orientation options:
    1. 0 ° - normal horizontal orientation,
    2. 90 ° - vertical orientation
    3. 270 ° - vertical orientation.

    Text located in the image at some other angle will be ignored. This also applies to text diagonally and mirrored.

    It is also important to remember that there is no perfect text recognition system, and perhaps the text will not be recognized contrary to your expectations. At the same time, the recognition engine is constantly being improved and improved in order to achieve greater accuracy.

    Can I use Evernote to create a text version from an image with text?

    Not. As mentioned above, the match checking system provides several options. In most cases, there will be several potential matches of varying degrees of accuracy for a given rectangle.

    How fast is recognition?

    When synchronizing a note with an image, the picture is sent for processing to a group of text recognition servers. The system is based on the principle of the queue, that is, each new image takes a place at the end and will be processed after all standing before it.

    The speed depends on the size of the queue at the time of sending the image. As a rule, the waiting time does not exceed an hour.

    How many languages ​​does the Evernote recognition system support?

    Currently, Evernote can index texts in 28 languages ​​in handwritten and 24 in handwritten. New languages ​​are added regularly, and the quality of working with existing ones is improving. Users can specify the languages ​​used for which to search for matches during indexing by changing the corresponding parameters in the personal settings .

    Where can I learn more about the Evernote recognition system?

    We previously published a couple of articles on this topic:

    You can also find an overview of the specifics of handwriting recognition in European and Asian languages.

    Also popular now: