Create Android text recognition application in 10 Minutes. Mobile Vision CodeLab
- Tutorial
Video version of the tutorial
Optical Character Recognition ( Optical Character Recognition , abbr. OCR ) allows a computer to read text on an image, allowing applications to understand signs, articles, leaflets, pages of text, menus, or anything in text. Mobile Vision Text API
provides developers with a Android
powerful and reliable feature OCR
that supports most devices Android
and does not increase the size of your application.
In this tutorial, you will create an application in which in the process of video filming all the text falling into the frame will be recognized and reproduced.
We also published articles about other Mobile Vision features:
The source code can be downloaded here .
Or clone the repository GitHub
from the command line:
$ git clone https://github.com/googlesamples/android-vision.git
The repository visionSamples
contains many examples of related projects Mobile Vision
. In this lesson, only two are used:
ocr-codelab / ocr-reader-start is the starting code you will use in this tutorial.
ocr-codelab / ocr-reader-complete - the complete code of the finished application. You can use it for troubleshooting or go directly to the work application.
Update Google Play Services
You may need to update the installed version Google Repository
to use Mobile Vision Text API
.
Open Android Studio
and open SDK Manager
:

Make sure it is Google Repository
updated. It must be at least 26
version.

Add a Google Play Services dependency and create an app to launch
Now you can open the starting project:
Select the launch directory from the downloaded code ( File > Open > ).
ocr-reader
ocr-codelab/ocr-reader-start
Add a dependency
Google Play Services
to the application. Without this dependenceText API
will not be available.
The project may indicate the absence of the integer / google_play_services_version file and generate an error. This is normal, we will fix it in the next step.
Open the file build.gradle
in the app
module and change the dependency block to include the dependency play-services-vision
. When everything is ready, the file should look like this:
dependencies {
implementation fileTree(dir: 'libs', include: ['*.jar'])
implementation 'com.android.support:support-v4:26.1.0'
implementation 'com.android.support:design:26.1.0'
implementation 'com.google.android.gms:play-services-vision:15.0.0'
}
Click the
sync button
Gradle
.Click the
start button.
After a few seconds you will see the “Read Text” screen, but this is just a black screen.

Now nothing happens because it is CameraSource
not configured. Let's do that.
If something goes wrong with you, you can open the project and make sure that it works correctly. This project is a finished version of the lesson, and if this version does not work, you should check that everything is fine with your device and settings .ocr-reader-complete
Android Studio
Configure TextRecognizer and CameraSource
To get started, we will create ours TextRecognizer
. This detector object processes the images and determines which text appears inside them. Once initialized, it TextRecognizer
can be used to detect text in all types of images. Find a method createCameraSource
and create TextRecognizer
:
OcrCaptureActivity.java
privatevoidcreateCameraSource(boolean autoFocus, boolean useFlash){
Context context = getApplicationContext();
// TODO: Create the TextRecognizer
TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
// TODO: Set the TextRecognizer's Processor.// TODO: Check if the TextRecognizer is operational.// TODO: Create the mCameraSource using the TextRecognizer.
}
Now TextRecognizer
ready to go. However, it may not work yet. If the device has insufficient memory or Google Play Services
cannot load dependencies OCR
, the object TextRecognizer
will not work. Before we start using it for text recognition, we need to check that it is ready. We will add this check in createCameraSource
after initialization TextRecognizer
:
OcrCaptureActivity.java
// TODO: Check if the TextRecognizer is operational.if (!textRecognizer.isOperational()) {
Log.w(TAG, "Detector dependencies are not yet available.");
// Check for low storage. If there is low storage, the native library will not be// downloaded, so detection will not become operational.
IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW);
boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null;
if (hasLowStorage) {
Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show();
Log.w(TAG, getString(R.string.low_storage_error));
}
}
Now that we have checked that we are TextRecognizer
ready for work, we can use it to recognize individual frames. But we want to do something more interesting: read the text in video mode. For this we will create CameraSource
one that is pre-configured to control the camera. We need to set a high resolution of shooting and enable autofocus to cope with the task of recognizing small text. If you are sure that your users will look at large blocks of text, such as signs, you can use a lower resolution, and then frame processing will be faster:
OcrCaptureActivity.java
// TODO: Create the cameraSource using the TextRecognizer.
cameraSource =
new CameraSource.Builder(getApplicationContext(), textRecognizer)
.setFacing(CameraSource.CAMERA_FACING_BACK)
.setRequestedPreviewSize(1280, 1024)
.setRequestedFps(15.0f)
.setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null)
.setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO : null)
.build();
This is how the method should look like createCameraSource
when you're done:
OcrCaptureActivity.java
privatevoidcreateCameraSource(boolean autoFocus, boolean useFlash){
Context context = getApplicationContext();
// Create the TextRecognizer
TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
// TODO: Set the TextRecognizer's Processor.// Check if the TextRecognizer is operational.if (!textRecognizer.isOperational()) {
Log.w(TAG, "Detector dependencies are not yet available.");
// Check for low storage. If there is low storage, the native library will not be// downloaded, so detection will not become operational.
IntentFilter lowstorageFilter = new IntentFilter(Intent.ACTION_DEVICE_STORAGE_LOW);
boolean hasLowStorage = registerReceiver(null, lowstorageFilter) != null;
if (hasLowStorage) {
Toast.makeText(this, R.string.low_storage_error, Toast.LENGTH_LONG).show();
Log.w(TAG, getString(R.string.low_storage_error));
}
}
// Create the cameraSource using the TextRecognizer.
cameraSource =
new CameraSource.Builder(getApplicationContext(), textRecognizer)
.setFacing(CameraSource.CAMERA_FACING_BACK)
.setRequestedPreviewSize(1280, 1024)
.setRequestedFps(15.0f)
.setFlashMode(useFlash ? Camera.Parameters.FLASH_MODE_TORCH : null)
.setFocusMode(autoFocus ? Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO : null)
.build();
}
If you run the application, you will see that the video has started! But for processing images from the camera, we need to finish this last TODO
to createCameraSource
: create a Processor
word processing as it becomes available.
Create an OcrDetectorProcessor
Now your application can detect text on individual frames using the discovery method in TextRecognizer
. So you can find the text, for example, in the photo. But in order to read the text directly during the video, you need to implement Processor
, which will process the text as soon as it appears on the screen.
Go to the class OcrDetectorProcessor
implement the interface Detector.Processor
:
OcrDetectorProcessor.java
publicclassOcrDetectorProcessorimplementsDetector.Processor<TextBlock> {
private GraphicOverlay<OcrGraphic> graphicOverlay;
OcrDetectorProcessor(GraphicOverlay<OcrGraphic> ocrGraphicOverlay) {
graphicOverlay = ocrGraphicOverlay;
}
}
To implement this interface, two methods must be overridden. The first one receiveDetections
, gets the input TextBlocks
from TextRecognizer
as they are detected. The second,, is release
used to free resources from destruction TextRecognizer
. In this case, we just need to clear the graphic canvas, which will lead to the removal of all objects OcrGraphic
.
We will get TextBlocks
and create objects OcrGraphic
for each text block detected by the processor. We will implement the logic of their drawing in the next step.
OcrDetectorProcessor.java
@OverridepublicvoidreceiveDetections(Detector.Detections<TextBlock> detections){
graphicOverlay.clear();
SparseArray<TextBlock> items = detections.getDetectedItems();
for (int i = 0; i < items.size(); ++i) {
TextBlock item = items.valueAt(i);
if (item != null && item.getValue() != null) {
Log.d("Processor", "Text detected! " + item.getValue());
OcrGraphic graphic = new OcrGraphic(graphicOverlay, item);
graphicOverlay.add(graphic);
}
}
}
@Overridepublicvoidrelease(){
graphicOverlay.clear();
}
Now that the processor is ready, we need to configure textRecognizer
it to use it. Return to the last remaining TODO
method createCameraSource
in OcrCaptureActivity
:
OcrCaptureActivity.java
// Create the TextRecognizer
TextRecognizer textRecognizer = new TextRecognizer.Builder(context).build();
// TODO: Set the TextRecognizer's Processor.
textRecognizer.setProcessor(new OcrDetectorProcessor(graphicOverlay));
Now run the application. At this stage, when you hover the camera on the text, you will see the debugging messages “Text detected!” In Android Monitor Logcat
! But this is not a very visual way of visualizing what it sees TextRecognizer
, right?
In the next step, we will draw this text on the screen.
Drawing text on the screen
Let's implement the method draw
in OcrGraphic
. We need to understand whether there is a text on the image, convert the coordinates of its borders to the frame of the canvas, and then draw both borders and text.
OcrGraphic.java
@Overridepublicvoiddraw(Canvas canvas){
// TODO: Draw the text onto the canvas.if (text == null) {
return;
}
// Draws the bounding box around the TextBlock.
RectF rect = new RectF(text.getBoundingBox());
rect = translateRect(rect);
canvas.drawRect(rect, rectPaint);
// Render the text at the bottom of the box.
canvas.drawText(text.getValue(), rect.left, rect.bottom, textPaint);
}
Run the application and test it on this sample text:

You should see a frame appear on the screen with text in it! You can play with the text color using TEXT_COLOR
.
How about this?

The frame around the text looks correct, but the text is at the bottom.

This is due to the fact that the engine transmits all the text that it recognizes TextBlock
in a single sentence, even if it sees a sentence divided into several lines. If you need to get a whole sentence, it is very convenient. But what if you want to know where each line of text is located?
You can get Lines
from TextBlock
, by calling getComponents
, and then, going through each line, you can easily get its location and the text inside it. This allows you to draw text in the place where it actually appears.
OcrGraphic.java
@Overridepublicvoiddraw(Canvas canvas){
// TODO: Draw the text onto the canvas.if (text == null) {
return;
}
// Draws the bounding box around the TextBlock.
RectF rect = new RectF(text.getBoundingBox());
rect = translateRect(rect);
canvas.drawRect(rect, rectPaint);
// Break the text into multiple lines and draw each one according to its own bounding box.
List<? extends Text> textComponents = text.getComponents();
for(Text currentText : textComponents) {
float left = translateX(currentText.getBoundingBox().left);
float bottom = translateY(currentText.getBoundingBox().bottom);
canvas.drawText(currentText.getValue(), left, bottom, textPaint);
}
}
Try this text again:

Fine! You can even break the found text into even smaller components, based on your needs. You can call getComponents
on each line and get Elements
(words in Latin). It is possible to customize the textSize
text to take up as much space as the actual text on the screen.

Reproduction of the text when clicking on it
Now the text from the camera is converted to structured lines, and these lines are displayed on the screen. Let's do something else with them.
Using the TextToSpeech API
built-in Android
, and method contains
in OcrGraphic
, we can teach the application to speak out loud when you click on the text.
First, let's implement the method contains
in OcrGraphic
. We just need to check whether the location x
and y
within the scope of the displayed text.
OcrGraphic.java
publicbooleancontains(float x, float y){
// TODO: Check if this graphic's text contains this point.if (text == null) {
returnfalse;
}
RectF rect = new RectF(text.getBoundingBox());
rect = translateRect(rect);
return rect.contans(x, y);
}
You may notice that there is much in common with the method Draw
! In this project, you should achieve reuse of the code, but here we leave everything as is just for the sake of example.
We now turn to the method onTap
in OcrCaptureActivity
and process the click on the text, if it is in this place.
OcrCaptureActivity.java
privatebooleanonTap(float rawX, float rawY){
// TODO: Speak the text when the user taps on screen.
OcrGraphic graphic = graphicOverlay.getGraphicAtLocation(rawX, rawY);
TextBlock text = null;
if (graphic != null) {
text = graphic.getTextBlock();
if (text != null && text.getValue() != null) {
Log.d(TAG, "text data is being spoken! " + text.getValue());
// TODO: Speak the string.
}
else {
Log.d(TAG, "text data is null");
}
}
else {
Log.d(TAG,"no text detected");
}
return text != null;
}
You can start the application and through Android Monitor Logcat
make sure that clicking on the text is really processed.
Let's make our application talk! Go to the beginning Activity
and find the method onCreate
. When launching the application, we must initialize the engine TextToSpeech
for further use.
OcrCaptureActivity.java
@OverridepublicvoidonCreate(Bundle bundle){
// (Portions of this method omitted)// TODO: Set up the Text To Speech engine.
TextToSpeech.OnInitListener listener =
new TextToSpeech.OnInitListener() {
@OverridepublicvoidonInit(finalint status){
if (status == TextToSpeech.SUCCESS) {
Log.d("TTS", "Text to speech engine started successfully.");
tts.setLanguage(Locale.US);
} else {
Log.d("TTS", "Error starting the text to speech engine.");
}
}
};
tts = new TextToSpeech(this.getApplicationContext(), listener);
}
Despite the fact that we have correctly initialized TextToSpeech
, as a rule, you still need to handle common errors, for example, when the engine is still not ready when you first click on the text.
TextToSpeech
also dependent on the recognition language. You can change the language based on the recognized text language. Language recognition is not built in Mobile Vision Text API
, but it is accessible via Google Translate API
. As a language for text recognition, you can use the user's device language.
Great, it remains only to add code to play the text in the method onTap
.
OcrCaptureActivity.java
privatebooleanonTap(float rawX, float rawY){
// TODO: Speak the text when the user taps on screen.
OcrGraphic graphic = graphicOverlay.getGraphicAtLocation(rawX, rawY);
TextBlock text = null;
if (graphic != null) {
text = graphic.getTextBlock();
if (text != null && text.getValue() != null) {
Log.d(TAG, "text data is being spoken! " + text.getValue());
// Speak the string.
tts.speak(text.getValue(), TextToSpeech.QUEUE_ADD, null, "DEFAULT");
}
else {
Log.d(TAG, "text data is null");
}
}
else {
Log.d(TAG,"no text detected");
}
return text != null;
}
Now, when you launch the application and click on the detected text, your device will play it. Try it!
Completion
Now you have an application that can recognize the text from the camera and pronounce it out loud!
You can apply your knowledge on text recognition in your other applications. For example, read the addresses and phone numbers from business cards, search the text from photos of various documents. In a word, use OCR
wherever you may need to recognize the text in the image.