saul October 2, 2017 at 14:17

Computer vision. Intel Expert Answers

Two weeks ago, we invited readers of Habra to ask their questions to the creators of the computer vision library OpenCV. Many questions were asked, and interesting ones, which means that this topic is of interest not only to Intel, but also to the broad masses of developers. Without further ado, we proceed to the publication of answers and invite to their discussion. We also announce the authors of the best questions! At the very end of the post.

Question noonv

How do you see the future of computer vision? Watching the development of machine learning, what prospects do you see?

Anatoly Baksheev . A few of my thoughts:

everything learnable, minimum handcrafting
the new is the well-forgotten old.

I think after some time there will be some return to traditional computer vision (when all low-hanging fruits on DL will be picked), but at a slightly different level of development and capabilities.

Vadim Pisarevsky. My vision is pretty standard. The near future (actually already the present) is in deep / deep learning, and it will be applied in an increasingly sophisticated and non-trivial way, which was proved by the last CVPR 2017 conference. 6 years ago, deep learning appeared, or rather, was revived after an article by Kryzhevsky (Alexnet), and then he well solved only one problem - recognition of the class of an object, provided that one dominant object was in the frame, without determining its position. 2 years ago almost everyone in our area already spoke about him. They came up with the first grids for detecting objects and grids for semantic segmentation. Prior to this, the task of semantic segmentation was considered a hopeless, unsolvable task, as a proof of Fermat's theorem.

There was a big problem with speed - everything worked very slowly. Now the grids have been compressed, the implementations have been optimized, transferred to the GPU, the specialized hardware is on the way and the issue of speed has for the most part disappeared and will completely disappear in the next couple of years - the grids are already working as fast as the traditional approaches, and much better. Now the main areas for research:

try to apply deep learning for new tasks, more and more complex,
in particular, to apply to tasks where it is difficult to collect huge training facilities.

The first general task will be the main trend of the coming decade, at least the second task, I think, will be solved in the next few years, in particular cases it is already being solved. The prospects are such that computer vision from a highly specialized field with a bunch of artisanal methods will turn into an industrial field and will greatly affect the lives of many people. Actually, this process is already happening at a high speed, and we are more likely talking not just about vision, but about artificial intelligence.

Question IliaSafonov

Are there any plans to add the ability to process 3D (volumetric) images in OpenCV? I work with tomographic images of the size of the order of 4000x4000x4000. Existing open-source libraries for 3D are, to put it mildly, poor and slow compared to OpenCV.

Vadim . Basic element-wise functions can already work with such data. 3-dimensional filtering and some other more complex algorithms yet. But the truth is, there are deep grids that can do some transformations on 3D data arrays. If there is a list of necessary operations, I invite you to submit a request for enhanced functionality . If there is a good detailed request with a description of the task, with links, then it is quite possible that this will become one of our projects for the next Google Summer of Code (summer 2018).

Question MaximKucherenko

The camcorder hangs on the bridge, under which there is a stream of cars. Excellent lighting was installed for the camcorder, in normal weather at night you can even see the faces of drivers. When the blizzard begins, the pictures are almost white (due to the large number of small moving objects, snowflakes). Can you tell me how to overcome this "noise"?

Anatoly . It is difficult to answer without the images themselves. You can try to make CNN which would somehow restore the picture. Check out CNN's impainting work, where the grid “thinks through” corrupted parts of the image. Or CNN debluring, where the grid is essentially trying to learn the classic Debluring algorithm. You can try to do the same for you.
In your case, the grid may be recursive in some place to take into account previous frames for the synthesis of a “clean” image.

Vadim. We need some kind of temporal filtering taking into account the movement of cars and cameras - i.e. we need to collect frames from several, we are talking about a certain variation on the theme of video superresolution, but without increasing the resolution. The time neighborhood of each frame is taken, the dense optical flux between the central frame and the neighboring ones is calculated, a certain penalty function is compiled for the resulting “improved” image — it should simultaneously be smooth and similar to all images from the neighborhood, taking into account the compensated movement. Then the iterative optimization process starts. I'm not sure that such an algorithm will work wonders, especially in extreme conditions (blizzard), but in situations of moderate complexity, it may be possible to improve the image at the output. But at the very beginning, without such an algorithm, you can try the functioncv :: equalizeHist () , maybe it will give something.

Question uzh13

Which language is best for experimenting with CV? Should I deal with Erlang for this?
Is there a canonical set of books or a series of articles for a quick start with technical vision? Is there anyone to chat with?

Vadim . Currently the most preferred options are:

C ++ with a good development environment,
Python

Depending on the circumstances, Python may be preferable to C ++ or vice versa. Erlang is too exotic for this area. Perhaps you will write something on it, but then it will be difficult to find like-minded people to discuss and develop this code together. From books on classical computer vision, the book R. Szeliski can be recommended. Computer Vision, a draft of which is available here . There are many books on OpenCV , usually also in English. You can start dating with deep learning using the following tutorial . As for communication, the question is more complicated. Well, in fact, everyone has the Internet now, you can join any project.

Anatoly. C ++ and Python are, in my opinion, classics for rapid prototyping and for serious solutions. There is no escape from this.

In addition to Vadim's answer, I recommend awesome repositories on github:

Generally seen awesome repositories supported by enthusiasts for many areas.

Question ChaikaBogdan

How to start your career in the field of CV, if you do not have experience in it? Where to gain experience in solving real problems and experience in this area that HRs love so much?

A little TL; DR which is the background to the question
I studied at a university where there was no such direction and, naturally, went to work in another field (software engineering, automation). If I had the opportunity and understanding how potentially cool it is to work in the field of CV, I would have entered it even in another university, but, alas, I learned about it too late. Retraining for a second higher education is somehow prohibitively long.

On duty, I came across the task of detecting using Python + OpenCV, I somehow solved it through template match (since the subject allowed). It was fun, new and in general everyone liked it, especially me.

He began to study the possibility of self-training, took the Introduction to Computer Vision (Udacity-Georgia Tech) course and began practical training from PyImageSearch .
At the same time, I looked at vacancies at Upwork and PyImageSearch Jobs, Fiverr and was upset, because there was clearly not enough knowledge to solve real problems (for example, light / shadow / angle conditions interfere almost everywhere). I’m not sure that even completing, say, the Guru course from PyImageSearch will help you find a decent job, because the examples are very “ideal” and rarely work, as they were thought in real conditions.

On exchanges such as Fiverr , Upwork , PyImageJobs there is a lot of competition and tasks are required to be completed very quickly. And I want something with a small threshold for entry and take-off learning-curve. I am silent about remote work. Plus, everywhere else they want deep / machine learning to follow.

I don’t want to give up my main job in order not to find a job in CV. But give up too. This is a cool and interesting area to develop professionally, no matter how you look).

Anatoly . I think you are on the right track.

In general, if a person wants to work in this field, I think that any normal leader will take him to him even without skills. The main thing is to demonstrate the desire to work, expressed in concrete actions: show the algorithms that you made for OpenCV, for caffe / tf / torch, show your projects on github, show your rating on Kaggle. I have an engineer who left his previous boring non-CV job, went to Thailand and hasn't worked anywhere for a year. Six months later, he got bored there, and he began to participate in Kaggle competitions. Then when he came to me, his good rating on Kaggle also played a role even without experience in CV. Now it is one of my strongest engineers.

Vadim. I have a “success story” for you. At one time, OpenCV filed a series of patches with the addition of face recognition functionality. Of course, now the face recognition problem is solved with the help of deep learning, and then it was quite simple algorithms, but not the point. The author of the code was a man from Germany named Philip. He then at the main work was engaged in boring projects, in his own words, programmed DSP. I found time after work to do face recognition, prepared patches, we accepted them. Naturally, he was listed there as the author. After some time, he wrote me a joyful letter that, among other things, thanks to such a clear “resume,” he found a job related to computer vision.

Of course, this is far from the only way. Just if you really like computer vision, get ready to do it overtime on a voluntary basis, gain practical experience. And as for education, how many people from the OpenCV team have received education in this area? Zero. We are all mathematicians, physicists, engineers. Common skills (which are developed by practice) are important to learn new material, mainly in English, to program, communicate, solve mathematical and engineering problems. And concrete knowledge is transitory. With the advent of deep learning a few years ago, most of our knowledge has become obsolete, and in a few years, deep learning may become obsolete technology.

Question aslepov78

Do not you think that you succumbed to mass hysteria about neural networks, deep learning?

Vadim . To paraphrase Winston Churchill, perhaps [modern] deep learning is a bad way to solve computer vision problems, but everything else we know is even worse. But no one has a monopoly on research, thank God, invent your own. And in fact, people come up with. I myself was a great skeptic of this approach several years ago, but, firstly, the results are obvious, and secondly, it turned out that deep learning can be applied stupidly (I took the first architecture I got, scored a million training examples, launched a cluster and a week later received the model or not received), but can be applied creatively. And then it becomes a truly magical technology, and tasks begin to be solved, which before that it was generally not clear how to approach. For example, the definition of 3D poses of players on the field with one camera.

Question aslepov78

OpenCV has become a warehouse of algorithms from various fields (computational geometry, signal processing, machine learning, etc.). Meanwhile, there are more advanced libraries in the same computational geometry (not to mention neural networks). It turns out that the meaning of OpenCV is only in one - all the dependencies in one bottle?

Vadim. We make the tool primarily for ourselves and our colleagues, and also integrate patches from the user community (not all, though, but most), i.e. what users find useful for themselves and others. It would be nice, of course, if C ++ had some kind of common model - how to write libraries so that they are compatible with each other, and they could be easily used together and there would be no problems with building and converting data structures. Then, perhaps, OpenCV could be painlessly replaced by a series of more specialized libraries. But there is no such model yet, and may not be. In Python, there is a similar model built around numpy and a system of modules and extensions, and Python wrappers for OpenCV, it seems to me, are pretty organically built into it. I think if you have practically worked in the field of CV for several years, then you will understand why OpenCV is needed and why it is designed the way it works. Or will not come.

Question aslepov78

Why are there so few turnkey solutions? For example, if I am new to CV, and I want to look for a black square on a white background, then opening the OpenCV dock, I will drown in it. Instead, I would like to scroll through the list of the most common and simple tasks and select, or combine. Those. OpenCV has virtually no declarative approach.

Vadim . In truth, OpenCV has no turnkey solutions at all. Ready-made solutions in computer vision cost a lot of money and are written for a specific customer to solve specific, very clearly defined tasks. The process of creating such solutions differs from combining blocks in approximately the same way as the process of designing, building and arranging an individual house to order differs from assembling a toy house from lego blocks.

Question vlasenkofedor

Please tell us about the most interesting projects with an original solution - OpenCV with microcomputers (Raspberry, ASUS ...)

Anatoly . We have little experience with these devices.

Question killla

Are there any small boards (of the Raspberry Pi level with a processor sharpened for OpenCV video processing) and a video camera connected directly to the microprocessor (microcontroller) without any intermediaries in the form of USB and its large delays? So that you could take it and on your knee quickly make a device for counting crows on a garden bed or a device for tracking an object (the simplest image processing + reaction with minimal delays to stimuli).

My own experience
Last time I tried to solve a similar problem about 4 years ago. 1) All the popular affordable development boards did not delay processing a good video stream faster than 1-2 times per second, it was unrealistic to use DSP without programming at low levels, and it was not easy to get a controller with a powerful, well-documented DSP and software 2) all cameras in all examples cling to USB, respectively, from scratch huge delays + software processing of the camera by a low-power main processor. There is almost no time left for processor recognition.

Vadim . Raspberry Pi, starting with the second generation, contains an ARM CPU with NEON vector instructions. OpenCV should work quite quickly on such a piece of hardware. Regarding the speed of video capture - we somehow squeezed 20-30 frames / sec from USB 2, it's not very clear what this is about.

Question killla

Are there any ready-made distributions and software “out of the box” for such glands that you can immediately start working with without finishing for weeks?

Vadim . OpenCV is built under any ARM Linux and is largely optimized using NEON. I think it’s worth looking at the Raspberry Pi first, for example, here is the experience of an enthusiast .

Question killla

Summarizing, I will ask the question this way: is it possible in 2017-2018 for a 2-3-year student of an IT specialty with basic programming skills, having laid in 10,000 rubles, to get a piece of iron at the level of a 2-3-year-old telephone, on which for 2-4 weeks of studying OpenCV and writing code to create the simplest device: a camera on a motor suspension with a pair of motion axes that will hang on the balcony and monitor the movement of your favorite dog in the yard?

Vadim . On the iron part of the answer I will not give, explore. About tracking a dog. The beacon will solve this problem easier, cheaper, more reliable. If the goal is not to solve the problem, but you want to practice computer vision, then please. For 2-4 weeks, you can indulge and at the same time begin to think about questions like:

how to handle the movement / wobble of the camera itself, what behavior is expected in the dark, fog, rain, snow, how to handle different seasons,
how to handle different lighting conditions - overcast, the sun at its zenith, the sun at sunrise with great shadows,
how the system should handle the appearance of another object in the field of visibility (car, person, cat, another dog, another dog of the same breed),
what quality is considered acceptable (the system gives you false messages about the dog missing every 5 minutes, the system reports the dog missing one day after its loss)
etc.

According to my modest estimates, if you take this task seriously, you can definitely take yourself for a year or two. Learn more about computer vision than they teach anywhere else.

Question almator

The function model = cv2.ANN_MLP () on python does not work.

Function code

import cv2
import numpy as np
import math
class NeuralNetwork(object):
def __init__(self):
self.model = cv2.ANN_MLP()
def create(self):
layer_size = np.int32([38400, 32, 4])
self.model.create(layer_size)
self.model.load('mlp_xml/mlp.xml')
def predict(self, samples):
ret, resp = self.model.predict(samples)
return resp.argmax(-1)
model = NeuralNetwork()
model.create()

Error AttributeError: 'module' object has no attribute 'ANN_MLP'

Vadim . See the example letter_recog.py from the OpenCV distribution.

Question almator

How will OpenCV evolve towards neural networks, machine learning? Where are simple examples for machine learning beginners? Preferably in Russian.

Anatoly . OpenCV does not plan network training, only a quick optimized inference. We already have a CNN Face Detector that can run more than 100fps on a modern Core i5 (though we cannot put it in public access). I think many current algorithms will be gradually instrumented by small (> 5000fps) auxiliary grids, whether it be featues or optical flow, or RANSAC, or any other algorithm.

Vadim . OpenCV will evolve towards deep learning. Ordinary neural networks are a special case and now we are of little interest. I can not advise anything in Russian, but I will be grateful if you find and let me know. In English, there are online courses and books on the net, the same deep learning tutorial mentioned above.

Question almator

What algorithm is better to look for relatively complex logos in the photo - for example, logos of various markings, where usually there is text, and drawings, and everything fits in the form? I tried it through Haar Cascade - this algorithm looks for solid pieces well, but it does not find such a complex multi-component object as a logo. I tried MatchTemplate - it does not look if there is minimal mismatch - reduction, rotation relative to the original image. Do not tell me in which direction to look?

Vadim . Deep nets + augmentation of the training base. That is, you need to collect a database of images of this logo, and then artificially expand it many times. Here , for example, is immediately located through Google.

Question WEBMETRICA

Is it possible to use computer vision to simulate the analogs of the vision of various biological organisms, for example, animals and insects and create an application that makes it possible to see the world through the eyes of other creatures?

Anatoly . I think this will not happen soon. Moreover, there is no method to reliably say how other creatures see the world.

Vadim . CVPR 2017 had an interesting article on the use of human-readable signals for pattern recognition. The authors promised an interesting continuation. Perhaps soon our younger brothers will reach.

Question WEBMETRICA

If you go even further, you can create many models of vision of various living beings, pass all this diversity through the neural network and create something new? Is the synthesis of vision of various biological systems possible?

Vadim . Everything is possible. I need to go from a specific task, it seems to me.

Question barabanus

Why is it impossible to multiply the matrix by a vector (cv :: Vec_) in OpenCV, but can it be multiplied by a point? (cv :: Point_) It turns out that it is easier to manipulate points when mathematically these are not points, but vectors. For example, the direction of a line is easier to store as a point, and not as a vector — fewer type conversions in the chain of operations.

Anatoly . I have known this problem for about 8 years. As far as I remember, it is impossible to implement - you can try it yourself. It turns out something like the ambiguity of calling a constructor for a service intermediate type - the compiler cannot decide for itself which constructor to call and generates an error. You will have to manually convert to a point via cv :: Mat * Point _ <...> (Vec _ <...>).

Vadim . I propose to submit a request . It is possible that in this particular case they simply skipped this function, or deliberately turned it off so as not to confuse the C ++ compiler in the whole set of overlapping '*' operators - sometimes this happens.

Question barabanus

Why so far in OpenCV there is not a single implementation of Hough Transform that would return a battery. After all, sometimes you need to find, say, the only maximum! Will project holders be allowed to add a new implementation that would return a battery?

Vadim . Yes, that would be helpful. After reviewing and necessary refinement, if necessary, such a patch can be accepted .

Question perfect_genius

Is Intel trying to create hardware neural networks for image processing and are there any results?

Anatoly . Hardware networks make little sense, because progress moves forward very quickly, and such a piece of hardware will become obsolete before it goes on sale. But the creation of accelerating instructions for networks (a la MMX / SSE / AVX) or even coprocessors, in my opinion, is a very logical step. But we do not own the information.

Vadim. At this stage, we are aware of the attempts, and our colleagues are actively involved in them, using the available hardware (CPU, GPU) to speed up the execution of grids. Attempts are quite successful. Accelerated solutions for the CPU (MKL-dnn library, and Intel Caffe compiled with it) and for the GPU (clDNN) allow you to run a large number of popular networks, such as AlexNet, GooLeNet / Inception, Resnet-50, etc. in real time on a regular computer without a powerful discrete card, on a regular laptop. Even OpenCV, although it does not yet use these optimized libraries, allows you to run some grids for classification, detection and semantic segmentation in real time on a laptop without discrete graphics. Try our examplesand see for yourself. Effective networking is closer than it seems to many.

Question Mikhail063

I have been using OpenCV for several years, but I came across such an interesting contraption. There is a camera that transmits a signal, and there is telemetry that receives a signal, and there is also a tuner that decodes the signal in video to a computer. So image capturing programs work with a bang, but the OpenCV library when you try to display the image displays a black screen, and when you try to exit the program, the blue screen falls out) QUESTION Why does this happen?

Device Features: EasyCap USB 2.0 TV tuner, 5.8GHz RC832 FPV video receiver, FPV camera with 5.8GHz 1000TVL transmitter.

Error video

Vadim . Because somewhere there is some kind of mistake, obviously :) We must start with localization.

Question KOLANICH

Does it make sense in modern realities in video analysis programs based on features designed by humans, or is it better not to fool around and immediately train the neural network?

Anatoly . Sometimes classic features can be a quick fix.

Vadim . It’s better to train for analysis. For simpler tasks, such as gluing panoramas, classic features such as SIFT are still competitive.

Question vishnerevsky

I used version of OpenCV 3.1.0, used cv2.HOGDescriptor () and .setSVMDetector (cv2.HOGDescriptor_getDefaultPeopleDetector ()) , I was impressed. But I want to reduce the number of false positives and therefore I want to find out which dataset was used to train the SVM classifier and can I access this set? I would also like to know if it is planned to create OpenCV modules for recognizing various objects based on YOLO or Semantic Segmentation?

Vadim . Databases took the standard, are in the public domain. Now specific configuration files with file lists have been lost, many years have passed. The patch with the addition of YOLO v.2 hangs, by the time these answers are published, I think we’ll fill it already. An example with MobileNet_SSD is already there . There you can find examples with segmentation.

Question iv_kovalyov

“Advise how you can recognize the stamp on the left in the image on the right? The stamps are not identical, but there are common elements.

I tried find_obj.py from the opencv examples, but in this situation this example does not help.

Vadim . See the tip above for finding logos. Only here, most likely, two grids will be needed - detection and subsequent recognition.

Intel experts recognized IliaSafonov 's best questions about using OpenCV for 3D objects and ChaikaBogdan about building a career in computer vision for a beginner. The authors of these questions get prizes from Intel. Congratulations to the winners and thank Anatoly and Vadim for the informative answers!

Tags:

Computer vision. Intel Expert Answers

Also popular now: