Create Android text recognition application in 10 Minutes. Mobile Vision CodeLab

  • Tutorial

Video version of the tutorial



Optical Character Recognition ( Optical Character Recognition , abbr. OCR ) allows a computer to read text on an image, allowing applications to understand signs, articles, leaflets, pages of text, menus, or anything in text. Mobile Vision Text APIprovides developers with a Androidpowerful and reliable feature OCRthat supports most devices Androidand does not increase the size of your application.


In this tutorial, you will create an application in which in the process of video filming all the text falling into the frame will be recognized and reproduced. 


We also published articles about other Mobile Vision features:



The source code can be downloaded here .


Or clone the repository GitHubfrom the command line:


$ git clone https://github.com/googlesamples/android-vision.git

The repository visionSamplescontains many examples of related projects Mobile Vision. In this lesson, only two are used:


  •  ocr-codelab / ocr-reader-start is the starting code you will use in this tutorial.
  •  ocr-codelab / ocr-reader-complete - the complete code of the finished application. You can use it for troubleshooting or go directly to the work application.

Update Google Play Services


You may need to update the installed version Google Repositoryto use Mobile Vision Text API.


Open Android Studioand open SDK Manager:



 
Make sure it is Google Repositoryupdated. It must be at least 26version.



 


Add a Google Play Services dependency and create an app to launch


Now you can open the starting project:


  1. Select the launch directory  from the downloaded code ( File > Open > ). ocr-readerocr-codelab/ocr-reader-start


  2. Add a dependency Google Play Servicesto the application. Without this dependence Text APIwill not be available.



The project may indicate the absence of the integer / google_play_services_version file and generate an error. This is normal, we will fix it in the next step.


Open the file build.gradlein the appmodule and change the dependency block to include the dependency play-services-vision. When everything is ready, the file should look like this:


dependencies {
    implementation fileTree(dir: 'libs', include: ['*.jar'])
    implementation 'com.android.support:support-v4:26.1.0'
    implementation 'com.android.support:design:26.1.0'
    implementation 'com.google.android.gms:play-services-vision:15.0.0'
}

  1. Click the  sync button Gradle.


  2. Click the  start button.



After a few seconds you will see the “Read Text” screen, but this is just a black screen.



 
Now nothing happens because it is  CameraSourcenot configured. Let's do that.


If something goes wrong with you, you can open the project  and make sure that it works correctly. This project is a finished version of the lesson, and if this version does not work, you should check that everything is fine with your device and settings .ocr-reader-completeAndroid Studio


Configure TextRecognizer and CameraSource


To get started, we will create ours TextRecognizer. This detector object processes the images and determines which text appears inside them. Once initialized, it TextRecognizercan be used to detect text in all types of images. Find a method createCameraSourceand create TextRecognizer:


OcrCaptureActivity.java


privatevoidcreateCameraSource(boolean autoFocus, boolean useFlash){
    Context context = getApplicationContext();
    // TODO: Create the TextRecognizer
    TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
    // TODO: Set the TextRecognizer's Processor.// TODO: Check if the TextRecognizer is operational.// TODO: Create the mCameraSource using the TextRecognizer.
}

Now TextRecognizerready to go. However, it may not work yet. If the device has insufficient memory or Google Play Servicescannot load dependencies OCR, the object TextRecognizerwill not work. Before we start using it for text recognition, we need to check that it is ready. We will add this check in createCameraSourceafter initialization TextRecognizer:


OcrCaptureActivity.java


// TODO: Check if the TextRecognizer is operational.if (!textRecognizer.isOperational()) {
    Log.w(TAG, "Detector dependencies are not yet available.");
    // Check for low storage.  If there is low storage, the native library will not be// downloaded, so detection will not become operational.
    IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW);
    boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null;
    if (hasLowStorage) {
        Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show();
        Log.w(TAG, getString(R.string.low_storage_error));
    }
}

Now that we have checked that we are TextRecognizerready for work, we can use it to recognize individual frames. But we want to do something more interesting: read the text in video mode. For this we will create CameraSourceone that is pre-configured to control the camera. We need to set a high resolution of shooting and enable autofocus to cope with the task of recognizing small text. If you are sure that your users will look at large blocks of text, such as signs, you can use a lower resolution, and then frame processing will be faster:


OcrCaptureActivity.java


// TODO: Create the cameraSource using the TextRecognizer.
cameraSource =
        new CameraSource.Builder(getApplicationContext(), textRecognizer)
        .setFacing(CameraSource.CAMERA_FACING_BACK)
        .setRequestedPreviewSize(1280, 1024)
        .setRequestedFps(15.0f)
        .setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null)
        .setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO : null)
        .build();

This is how the method should look like createCameraSourcewhen you're done:


OcrCaptureActivity.java


privatevoidcreateCameraSource(boolean autoFocus, boolean useFlash){
    Context context = getApplicationContext();
    // Create the TextRecognizer
    TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
    // TODO: Set the TextRecognizer's Processor.// Check if the TextRecognizer is operational.if (!textRecognizer.isOperational()) {
        Log.w(TAG, "Detector dependencies are not yet available.");
        // Check for low storage.  If there is low storage, the native library will not be// downloaded, so detection will not become operational.
        IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW);
        boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null;
        if (hasLowStorage) {
            Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show();
            Log.w(TAG, getString(R.string.low_storage_error));
        }
    }
    // Create the cameraSource using the TextRecognizer.
    cameraSource =
            new CameraSource.Builder(getApplicationContext(), textRecognizer)
            .setFacing(CameraSource.CAMERA_FACING_BACK)
            .setRequestedPreviewSize(1280, 1024)
            .setRequestedFps(15.0f)
            .setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null)
            .setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO : null)
            .build();
}

If you run the application, you will see that the video has started! But for processing images from the camera, we need to finish this last TODOto createCameraSource: create a Processorword processing as it becomes available.


Create an OcrDetectorProcessor


Now your application can detect text on individual frames using the discovery method in TextRecognizer. So you can find the text, for example, in the photo. But in order to read the text directly during the video, you need to implement Processor, which will process the text as soon as it appears on the screen.


Go to the class OcrDetectorProcessorimplement the interface Detector.Processor:


OcrDetectorProcessor.java


publicclassOcrDetectorProcessorimplementsDetector.Processor<TextBlock> {
    private GraphicOverlay<OcrGraphic> graphicOverlay;
    OcrDetectorProcessor(GraphicOverlay<OcrGraphic> ocrGraphicOverlay) {
        graphicOverlay = ocrGraphicOverlay;
    }
}

To implement this interface, two methods must be overridden. The first one receiveDetections, gets the input TextBlocksfrom TextRecognizeras they are detected. The second,, is releaseused to free resources from destruction TextRecognizer. In this case, we just need to clear the graphic canvas, which will lead to the removal of all objects OcrGraphic.


We will get TextBlocksand create objects OcrGraphicfor each text block detected by the processor. We will implement the logic of their drawing in the next step.


OcrDetectorProcessor.java


@OverridepublicvoidreceiveDetections(Detector.Detections<TextBlock> detections){
    graphicOverlay.clear();
    SparseArray<TextBlock> items = detections.getDetectedItems();
    for (int i = 0; i < items.size(); ++i) {
        TextBlock item = items.valueAt(i);
        if (item != null && item.getValue() != null) {
            Log.d("Processor", "Text detected! " + item.getValue());
            OcrGraphic graphic = new OcrGraphic(graphicOverlay, item);
            graphicOverlay.add(graphic);
        }
    }
}
@Overridepublicvoidrelease(){
    graphicOverlay.clear();
}

Now that the processor is ready, we need to configure textRecognizerit to use it. Return to the last remaining TODOmethod createCameraSourcein OcrCaptureActivity:


OcrCaptureActivity.java


// Create the TextRecognizer
TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
// TODO: Set the TextRecognizer's Processor.
textRecognizer.setProcessor(new OcrDetectorProcessor(graphicOverlay));

Now run the application. At this stage, when you hover the camera on the text, you will see the debugging messages “Text detected!” In Android Monitor Logcat! But this is not a very visual way of visualizing what it sees TextRecognizer, right?


In the next step, we will draw this text on the screen.


Drawing text on the screen


Let's implement the method drawin OcrGraphic. We need to understand whether there is a text on the image, convert the coordinates of its borders to the frame of the canvas, and then draw both borders and text.


OcrGraphic.java


@Overridepublicvoiddraw(Canvas canvas){
    // TODO: Draw the text onto the canvas.if (text == null) {
        return;
    }
    // Draws the bounding box around the TextBlock.
    RectF rect = new RectF(text.getBoundingBox());
    rect = translateRect(rect);
    canvas.drawRect(rect, rectPaint);
    // Render the text at the bottom of the box.
    canvas.drawText(text.getValue(), rect.left, rect.bottom, textPaint);
}

Run the application and test it on this sample text:



You should see a frame appear on the screen with text in it! You can play with the text color using TEXT_COLOR.


How about this?



The frame around the text looks correct, but the text is at the bottom.



 
This is due to the fact that the engine transmits all the text that it recognizes TextBlockin a single sentence, even if it sees a sentence divided into several lines. If you need to get a whole sentence, it is very convenient. But what if you want to know where each line of text is located?


You can get Linesfrom TextBlock, by calling getComponents, and then, going through each line, you can easily get its location and the text inside it. This allows you to draw text in the place where it actually appears.


OcrGraphic.java


@Overridepublicvoiddraw(Canvas canvas){
    // TODO: Draw the text onto the canvas.if (text == null) {
        return;
    }
    // Draws the bounding box around the TextBlock.
    RectF rect = new RectF(text.getBoundingBox());
    rect = translateRect(rect);
    canvas.drawRect(rect, rectPaint);
    // Break the text into multiple lines and draw each one according to its own bounding box.
    List<? extends Text> textComponents = text.getComponents();
    for(Text currentText : textComponents) {
        float left = translateX(currentText.getBoundingBox().left);
        float bottom = translateY(currentText.getBoundingBox().bottom);
        canvas.drawText(currentText.getValue(), left, bottom, textPaint);
    }
}

Try this text again:



Fine! You can even break the found text into even smaller components, based on your needs. You can call getComponentson each line and get Elements(words in Latin). It is possible to customize the textSizetext to take up as much space as the actual text on the screen.



 


Reproduction of the text when clicking on it


Now the text from the camera is converted to structured lines, and these lines are displayed on the screen. Let's do something else with them.


Using the TextToSpeech APIbuilt-in Android, and method containsin OcrGraphic, we can teach the application to speak out loud when you click on the text.


First, let's implement the method containsin OcrGraphic. We just need to check whether the location xand ywithin the scope of the displayed text.
OcrGraphic.java


publicbooleancontains(float x, float y){
    // TODO: Check if this graphic's text contains this point.if (text == null) {
        returnfalse;
    }
    RectF rect = new RectF(text.getBoundingBox());
    rect = translateRect(rect);
    return rect.contans(x, y);
}

You may notice that there is much in common with the method Draw! In this project, you should achieve reuse of the code, but here we leave everything as is just for the sake of example.


We now turn to the method onTapin OcrCaptureActivityand process the click on the text, if it is in this place.


OcrCaptureActivity.java


privatebooleanonTap(float rawX, float rawY){
    // TODO: Speak the text when the user taps on screen.
    OcrGraphic graphic = graphicOverlay.getGraphicAtLocation(rawX, rawY);
    TextBlock text = null;
    if (graphic != null) {
        text = graphic.getTextBlock();
        if (text != null && text.getValue() != null) {
            Log.d(TAG, "text data is being spoken! " + text.getValue());
            // TODO: Speak the string.
        }
        else {
            Log.d(TAG, "text data is null");
        }
    }
    else {
        Log.d(TAG,"no text detected");
    }
    return text != null;
}

You can start the application and through Android Monitor Logcatmake sure that clicking on the text is really processed.


Let's make our application talk! Go to the beginning Activityand find the method onCreate. When launching the application, we must initialize the engine TextToSpeechfor further use.


OcrCaptureActivity.java


@OverridepublicvoidonCreate(Bundle bundle){
    // (Portions of this method omitted)// TODO: Set up the Text To Speech engine.
    TextToSpeech.OnInitListener listener =
            new TextToSpeech.OnInitListener() {
                @OverridepublicvoidonInit(finalint status){
                    if (status == TextToSpeech.SUCCESS) {
                        Log.d("TTS", "Text to speech engine started successfully.");
                        tts.setLanguage(Locale.US);
                    } else {
                        Log.d("TTS", "Error starting the text to speech engine.");
                    }
                }
            };
    tts = new TextToSpeech(this.getApplicationContext(), listener);
}

Despite the fact that we have correctly initialized TextToSpeech, as a rule, you still need to handle common errors, for example, when the engine is still not ready when you first click on the text.


TextToSpeechalso dependent on the recognition language. You can change the language based on the recognized text language. Language recognition is not built in Mobile Vision Text API, but it is accessible via Google Translate API. As a language for text recognition, you can use the user's device language.


Great, it remains only to add code to play the text in the method onTap.


OcrCaptureActivity.java


privatebooleanonTap(float rawX, float rawY){
    // TODO: Speak the text when the user taps on screen.
    OcrGraphic graphic = graphicOverlay.getGraphicAtLocation(rawX, rawY);
    TextBlock text = null;
    if (graphic != null) {
        text = graphic.getTextBlock();
        if (text != null && text.getValue() != null) {
            Log.d(TAG, "text data is being spoken! " + text.getValue());
            // Speak the string.
            tts.speak(text.getValue(), TextToSpeech.QUEUE_ADD, null, "DEFAULT");
        }
        else {
            Log.d(TAG, "text data is null");
        }
    }
    else {
        Log.d(TAG,"no text detected");
    }
    return text != null;
}

Now, when you launch the application and click on the detected text, your device will play it. Try it!


Completion


Now you have an application that can recognize the text from the camera and pronounce it out loud!


You can apply your knowledge on text recognition in your other applications. For example, read the addresses and phone numbers from business cards, search the text from photos of various documents. In a word, use OCRwherever you may need to recognize the text in the image.


A source


Also popular now: