Google Cloud Vision API. Is the future of Computer Vision as a service now?

  • Tutorial
A year ago, Google piled the Cloud Vision API platform. The idea of ​​the platform is to provide Computer Vision technologies in which Google is the undisputed leader as a service. A couple of years ago, each technology had its own technology. It was impossible to take something in common and get the algorithm to solve everything. But Google swung. Now, a year has passed. And the technology is still not at the hearing. On habr one article . Yes, and that is not about Cloud Vision api, but about Face api, which was the predecessor. English-language Internet is also not full of articles. Unless from Google itself. Is this a failure?



I was interested to see what it is back in spring. But there was not enough strength to sit fully. Sometimes I tested something separate. Periodically, customers came and asked why it is impossible to use Cloud Api. I had to answer. Or vice versa, send from the threshold in this direction. And suddenly I realized that the material on the article is already enough. Go.

What is included in the Cloud Vision API


Google says a lot of different and beautiful words. But it is not interesting. All that they can do according to their price list:



  • Label Detection - detecting the class the image belongs to, detecting what is shown in the image: cats, dogs, elephants, serenity, etc.
  • OCR - text recognition
  • Explicit Content Detection - detection of all bad content: wipe blacks and other horrors of life.
  • Facial Detection - detection of faces, facial features, special points on faces
  • Landmark Detection - geolocation detection by photo
  • Logo Detection - detecting characters and icons
  • Image Properties - did not understand what it means

In the article I will talk about Label Detection, ORC, Facial Detection as the most logical ComputerVision tasks for which I know analogies / options. A little casual touch Landmark Detection.

Label detection


Anyway, everyone uses this feature from Google. Searching on a picture on google.com uses it. In essence, Label Detection gives images the characteristics. The characteristic may be the object shown in the photo. It may be the style of photography, for example, “Macro”, “Portrait”, “Black and White”. Or maybe something very general: a "botanical object", "atmospheric phenomenon."
In addition to searching, this function can solve problems:

  • Image Database Sorts
  • Signature tags to any photobank
  • Photo Interest Analysis
  • Etc.

Analogs . But, strangely enough, in this direction google has many competitors:
Microsoft . Able to describe quite well what is happening on the image, and not just its component. There are no online demos to compare.

  • IBM is much poorer and poorer in recognition.
  • Cloud Sight - muddy razvodilovo. They are pretending to pretend that they have an automatic system that 100% correctly recognizes. In reality, Indians are sitting. They want 50 bucks for 800 images. It recognized me very badly. But maybe everyone just went out to smoke.
  • Clarifai . It works awesome. I didn’t even believe it. But it recognizes and signs best of all within 2-3 seconds. Sometimes, however, Google won
  • There are several more smaller players with worse recognition performance.
  • There are open grids trained on ImageNet that you can configure yourself. Cheap and cheerful. But it will not work very well.

Here there is a large and complete comparison. I will give just a few examples:



  • According to Google .
  • According to IBM .
  • According to CloudSight .
  • According to Clarifi .
  • In the opinion of Caffe trained on ImageNet, GoogleNet, which is publicly available from Caffe, is Siberian Husky.

Example when Google could not cope: 1 . An example when he gave an inaccurate, in my opinion, description: 1 . There is no word "apple". But only cloudsight managed. There is no word “man” - 2 . There is no word “crow” - 3 .

And only in this picture of those that I tried, Google went around everyone and found a cat:



The rest did not manage - 1 , 2 , 3 .

Conclusion: works well. There are more than enough competitors to keep up. At home, it is almost impossible to repeat.

Facial detection


Face recognition. What Google can do:

  • Find face
  • Find feature points
  • Identify facial expression

What he does not know how, but competitors do:

  • Determine gender
  • Determine age
  • Compare two faces and decide whether they are the same or not.

Competitors, for example, here: 1 , 2 , 3 . In general, dozens of them.

What is the point of comparing Google? For example, with Haar cascading faces, as in OpenCV, or HOG, as in dLib. They win Google. And he finds points of the face better than dlib:

Dlib:



More and more .

Google:



More and more .

But at the same time, Google is paid, and Dlib is free. To configure you need the same number of lines. At the same time, if you get tired, then you can take something state-of-art instead of dlib and get accuracy almost no worse than Google.

In general, this round of Google definitely leaked.

Landmark detection


But this item is the HRC. and Google has no equal. And there are no analogues. When I realized how this function works, I thought, “Well, the Kremlin recognizes it.” But it was not there. In addition to the Kremlin, it successfully recognizes all the more or less significant tourist sites. Two examples that froze me:

Borovsk:



Well, figs with him, he is more or less touristly popular. There are a lot of photos. Let it be lucky.

Conclusion:


Manor-pioneer camp lost between Moscow and St. Petersburg near Borovichi.

I don’t know how he does it. Here are a bunch of examples: 1 , 2 , 3 , 4 , 5 , 6 , 7 .

He faked only once. But in the most epic way:



OCR - Optical Carecter Recognition


And finally, we move on to the most interesting part. It was for the sake of this that I got to dig. How well Google recognizes texts. It is this application that is the most industrial, it is here that dozens of manufacturers and consumers would be interested in having ready-made solutions:

  • Recognize books
  • Recognize price tags
  • Recognize plates
  • Recognize the license plate, train numbers, house numbers, ...
  • Etc., do not count the applications

Let's try to compare what Google has achieved. Compare with existing solutions. And compare with its only competitor as a "common recognizer" - Microsoft .

Books, texts
For texts, Google has a strong competitor to Abbyy . From what I tested, it seems to me that the level of character recognition for them is approximately at the same level:



Google
Perform a wireless installation (wireless models only
Before starting the installation, verify that the
wireless access point is working correctly, the computer is connected to the
network, and the product is turned on
If there is not a solid blue light on the top of the to process A
If there is a solid blue light on the top of process ⟨B 1⟩
3. Connect the USB cable between the computer and the product.
The ⟨HP⟩1 ⟨Smart Install⟩4 program (see picture should start
automatically within 30 seconds. Note: If ⟨HP⟩1 Smart Install does not
start automatically. ⟨AutoPlay⟩6 might be disabled on your computer 
⟨Browse My Computer⟩5and double-click the ⟨HP⟩1 Smart Install CD drive.
Double-click the ⟨SISetup.⟩7exe file to run the program to install the
product. If you cannot find the ⟨HP⟩1 Smart Install CD drive,
use the software CD to install the product.
2. Follow the onscreen instructions.
3. When prompted to select a connection type, select the "Configure" 11
 to print over the "Wireless Network" 10 option.
1. From the product control panel, press and hold the cancel button X
for 5 seconds, and then release it to print a Configuration page. This
page will have an ⟨lP⟩8address in the ⟨Network Information⟩9 section.
2. At the computer, open a Web browser, type the product IP address
in the address field, and press the Enter key to open the product
embedded web server page.
3. Click the ⟨HP⟩1 Smart Install tab, and then click the Download button.
4. Follow the onscreen instructions.

Abbyy
Perform a wireless installation (wireless models only)
Before starting the installation, verify that the
point is working correctly, the computer is connected to the
network, and the product is turned on.
If there is not a solid blue light on the top of? Tu pfod-; to process A.
If there is a solid blue light on the top of the proa ct, g <f process B.
A.
1. Connect the USB cable between the computer and the product. The HP Smart Install program (see picture above) should start automatically within 30 seconds Note: If HP Smart Install does not start automatica ly AutoPlay might be disabled on your computer Browse My Computer and double-click the HP Smart Install CD drive. Double-click the SISetup.exe file to run the program to instal the product. If you cannot find the HP Smart Install CD drive, use the software CD to install the product
2. Follow the onscreen instructions.
3. When prompted to select a connection type, select the Configure to print over Wireless Network option
1. From the product control panel, press and hold the cancel button X for 5 seconds, and then release it to print a Configuration page. This page will have an IP address in the Network Information section
2. At the computer, open a Web browser, type the product IP address in the address field, and press the Enter ke> »to open the product embedded web server page
3. Click the HP Smart Install tab, and then dick the Download button
4. Follow the onscreen instructions.

Microsoft
Perform a wireless instanotion (wir & ss models 0+)
Before starting the installation. verify that the
pant is working correctly, the cornputer • s connected the
network. and the product is turned on.
If there is not a solid blue light on the top product.
to process A.
If there is 0 sohd blue light on the bp the product. 90 to
process B.
l. Connect the USB cable between the computer and
the product. The HP Smart Install program (see p • dure
above) should skirt automatically within 30 seconds-
Note: If HP Smart Install does not start automatically
AutoPlay rmght be disabled on your computer
Browse My Computer and double-click the
HP Smart Install CD drive. Double-click the
SISetup.exe file to run the program the
product. If you cannot find the HP Smart Install CD
drive, use the software CD to install the product.
2.
Follow the onscreen instructions.
3.
When prompted to select a connection type, select the
Configure to print over Wireless Network opt.on.
B
.. From the product control panel, press and hold the
cancel button X for 5 seconds, and then release it to
print a Configuration page. This page will have an IP
address in the Network Information section.
2. At the computer, open a Web browser, type
IP address in the address field. and press the Enter key
to open the product embedded web server page.
3. Click the HP Smart Install tab, and then click the
Download button.
4. Follow the onscreen instructions.

It can be seen that only Google and Abbyy really compete. But as soon as it comes to the bulk of the text on the page, then Abbyy wins here: he knows how to structure the text, translate tables, footers, etc. Google gives out a bulk of the text. Plus, Google has few languages ​​to support.

Wang, that in the near future there will be startups that will use Google Api for translation, and they will attach all the structural analytics + text collection on top. Given that Abbyy wants to translate 10 times more than Google - this is a rather juicy broth.

It is clear that in the text segment there is not a single good software that can be launched at home. So move on to the next OCR task.

Price tags, other tablets
An important point - google does not support other languages, unlike microsoft. But in general, both work when the text is good, not blurry, not tilted, not noisy, and does not glare:



Microsoft:
175766
AHAHAC B AHAHACE
tıııışıııııııı 175T

Google:
175 AHAHAC BAHAHACE 340167 000000 900708249



Microsoft:

Google did not recognize anything :
K) cpeacrao ainocya. AOS Aqua Aqua ban.aasa an03 Depa Pocowe 100or - 912 mov



Microsoft:

Google Zero :



Overall, somewhere around 60% of the labels are text. And in my opinion this is just an awesome result. It’s not clear how and why this text is then collected.

Moreover, even well-shot tablets read well. Not all text, of course, but large for sure:

original


Google


Microsoft


But still, texts of different formats are not well recognized:

Google


Microsoft


Technical Information
But here Google and Microsoft are specifically crap so far. Of the dozens of checked car numbers, Microsoft recognized 20 percent. Google 60 percent, and even those without a region. The region recognized only in an ideal situation, when a large number without dirt. As soon as dirt - a separate piece recognizes everything.



Plus regular errors of 1-2 characters:



Number recognition systems rely on a priori information => work better. Although, of course, for some applications Google may be enough. Recognizes the perfect shot point-blank to the number.

Without a priori information is bad. Another option for technical vision is train numbers. Microsoft failed at all. Google gave only 50 percent correct. On the rest he constantly mowed:



So in the tasks of quality control, stable text recognition, Google and Microsoft are suitable only for the simplest tasks.

Of course, there are no OpenSource solutions on the topic of such problems, but often they can be solved on their own. Enumeration of simple hypotheses, search for a contour, etc. The same car number allocation works quite stably, for example, with OpenCV. There is a Haar cascade and contour selection. Plus you can teach LBP | HOG.

Total


  • Label Detection - Google is ahead, but in a close and dense fight.
  • Facial Detection - Google is behind. It is not clear what his decision is about.
  • Landmark Detection - Adoration! This is nowhere to be found!
  • OCR - A real battle is developing here. Google began to step on the heels of serious decisions, but so far can not get around. Moreover, in the area where a clear statement of the problem is absent, it is ahead. Microsoft is far enough behind, but is trying to catch up.

So far, before a stable CV solution, everything is far from the giants. But they slowly and smoothly capture the entire market. Yes, their solutions can only work in conditions of access to the Internet. But it will often be easier to make the Internet than to gash the solution yourself.

Also popular now: