Introduction to the course "Image and Video Analysis". Lectures from Yandex

    We are starting to publish lectures by Natalya Vasilyeva , a senior fellow at HP Labs and the head of HP Labs Russia. Natalya Sergeevna taught a course on image analysis at the St. Petersburg Computer Science Center, which was created on the joint initiative of the Yandex Data Analysis School, JetBrains and CS Club

    In total, there are nine lectures in the program. The first of them talks about how image analysis is used in medicine, security systems and industry, what tasks it has not yet learned to solve, and what are the advantages of human visual perception. The transcript of this part of the lectures is under the cut. Starting from the 40th minute, the lecturer talks about the Weber experiment, color representation and perception, Mansell color system, color spaces and digital representations of the image. Fully lecture slides are available here .

    Images are everywhere around us. The volume of multimedia information is growing every second. Films, sports games are being shot, equipment for video surveillance is being installed. Every day we ourselves take a large number of photos and videos - almost every phone has such an opportunity.

    For all these images to be useful, you need to be able to do something with them. You can put them in a box, but then it is not clear why to create them. It is necessary to be able to search for the necessary pictures, to do something with the video data - to solve tasks specific to a particular area.

    Our course is called Image and Video Analysis, but it will mainly be about images. It is impossible to start video processing without knowing what to do with the picture. Video is a collection of still images. Of course, there are tasks specific to the video. For example, tracking objects or highlighting some key frames. But all the algorithms for working with video are based on image processing and analysis algorithms.

    What is image analysis? This is largely related to and intersecting with computer vision. She has no exact and unique definition. For example, we give three.

    Computing properties of the 3D world from one or more digital images. Trucco and veri

    This definition implies that regardless of whether we are or not, there is some kind of surrounding world and its images, analyzing which we want to understand something about it. And this is suitable not only for determining the analysis of digital images by a machine, but also for our analysis with our head. We have a sensor - the eyes, we have a transforming device - the brain, and we perceive the world by analyzing the pictures that we see.

    Make useful decision about real physical objects and scenes based on the sensed images. Shapiro

    This is probably more related to robotics. We want to make decisions and draw conclusions about the real objects around us based on the images that the sensors caught. For example, this definition is ideal for describing what a robot vacuum cleaner does. He decides on where to go next and which angle to vacuum based on what he sees.

    The construction of explicit, meaningful decisions of physical objects from images

    The most common definition of the three. If we rely on it, we want to simply describe the phenomena and objects around us based on image analysis.

    Summing up, we can say that, on average, image analysis comes down to extracting meaningful information from the images. For each specific situation, this significant information may be different.

    If we look at a photograph in which a little girl eats ice cream, we can describe it in words - this is how the brain interprets what we see. About this we want to teach the car. To describe the image with text, it is necessary to carry out operations such as recognition of objects and faces, determining the sex and age of a person, selecting areas that are uniform in color, recognizing actions, and highlighting textures.

    Communication with other disciplines

    As part of the course, we will talk about image processing algorithms. They are used when we increase the contrast, remove color or noise, apply filters, etc. ... In principle, changing pictures is all that is done in image processing.

    Next up are image analysis and computer vision. There are no exact definitions for them, but, in my opinion, they are characterized by the fact that having an image at the input, at the output we get a certain model or some set of features. That is, some numerical parameters that describe this image. For example, a histogram of the distribution of gray levels.

    In image analysis, as a result, we get a feature vector. Computer vision solves wider tasks. In particular, models are being built. For example, a set of two-dimensional images can be used to construct a three-dimensional model of premises. And there is another related area - computer graphics, in which the image is generated according to the model.

    All this is impossible without the use of knowledge and algorithms from a whole range of areas. Such as pattern recognition and machine learning. In principle, we can say that image analysis is a special case of data analysis, an area of ​​artificial intelligence. Neuropsychology can also be related to related discipline - in order to understand what possibilities we have and how the perception of pictures is arranged, it would be good to understand how our brain solves these problems.

    What is image analysis for?

    There are huge archives and collections of images, and one of the most important tasks is to index and search for images. Collections are different:

    • Personal. For example, on vacation, a person can take a couple of thousand photos, with which you then need to do something.
    • Professional. They count millions of photos. Here, too, there is a need to somehow organize them, to seek, to find what is required.
    • Collections of reproductions. These are also millions of images. Now a large number of museums have virtual versions for which reproductions are digitized, i.e. we get pictures of the pictures. So far, the utopian task is to search for all reproductions of the same author. A person in style can assume that he sees, for example, pictures of Salvador Dali. It would be great if the machine also learned this.

    What can be done with all these pictures? The simplest thing is that you can somehow smartly build navigation on them, classifying them by topic. Separately put the bears, separately the elephants, separately the oranges - so that the user would later find it convenient to navigate this collection.

    A separate task is finding duplicates . In two thousand photographs from the vacation there are not so many non-repeating ones. We love to experiment, shoot with different shutter speeds, focal lengths, etc., which in the end gives us a large number of fuzzy duplicates . In addition, a duplicate search can help you discover the illegal use of your photo that you once posted on the Internet.

    A great challenge is choosing the best photo.. Using the algorithm, you can understand which picture the user will like best. For example, if this is a portrait, the face should be lit, the eyes should be open, the image should be clear, etc. Modern cameras already have this feature.

    Also the search task is to create collages , i.e. selection of photos that will look good next.

    Application of image analysis algorithms

    Now absolutely amazing things are happening in medicine.

    • Identification of anomalies . Already a well-known and solvable problem. For example, they try to understand from an X-ray image whether the patient is healthy or not - whether this image is different from the image of a healthy person. This can be a snapshot of the whole body, or separately of the circulatory system, to distinguish abnormal vessels from it. As part of this task - the search for cancer cells.
    • Diagnosis of diseases . Also made based on snapshots. If you have a database of patient images and it is known that the first anomaly occurs in healthy people, and the second means that the person is ill with cancer, then, based on the similarity of the images, doctors can be helped with the diagnosis of diseases.
    • Modeling the body and predicting the effects of treatment . Now this is what is called cutting edge. Although we are all alike, each organism is individually designed. For example, we may have a different arrangement or thickness of blood vessels. If a person needs to connect a broken vessel with a shunt, then you can determine where to put it, based on the expert opinion of the doctor, or you can model the circulatory system from the picture and "insert" the shunt in this model. So we get the opportunity to see how the blood flow changes, and predict how the patient will feel with different options.

    Another area of ​​application is security systems . In addition to using fingerprints and the retina for authorization, there are still unsolved problems. For example, ** detection of “suspicious” items **. Its difficulty is that you cannot give a description in advance of what is a suspicious subject. Another interesting task is ** identification of suspicious behavior ** of a person in video surveillance systems. It is impossible to provide all possible examples of abnormal behavior, therefore, recognition will be based on identifying deviations from what is marked as normal.

    There are still a large number of areas where image analysis is used: the military industry, robotics, film production, the creation of computer games, and the automotive industry. In 2010, an Italian company equipped a truck with cameras, which, using maps and a GPS signal, drove from Italy to Shanghai in automatic control. The path also passed through Siberia, not all roads of which are on the maps. In this section, the map was handed to him by a man-driven car that was driving in front of it. The truck itself recognized traffic signs, pedestrians and understood how it could be reconstructed.


    But why do we still drive cars on our own, and even a person should be assigned to video surveillance systems? One of the key issues is the semantic gap .

    A person looking at a picture understands its semantics. The computer understands the color of pixels, can distinguish the texture and ultimately distinguish a brick wall from a carpet and recognize a person in a photograph, but the machine can still determine if he is happy. We ourselves cannot always understand this. That is, an automatic understanding of whether students miss a lecture is the next level.

    In addition, our brain is a unique system of understanding and processing the image that we see. He is inclined to see what we want to see, and how to teach the same computer is an open question.

    We are very good at generalizing. From the image we are able to guess that we see a lamp. We do not need to know all modifications of an object from one class in order to attribute a sample to it. It is more difficult for a computer to do this because visually different lamps can be very different.

    There are a number of difficulties that image analysis has not yet managed.

    Human visual perception

    Our brain often “completes” the picture and adds semantics. We can all see “something” or “someone” in the shape of a cloud. The visual system is self-learning. It is difficult for a European to distinguish the faces of Asians, since usually in life he rarely meets them. The visual system has learned to capture the differences in European faces, and Asians, whom he saw a little, seem to him "on one face." And vice versa. There was a case with colleagues from Palo Alto who, together with the Chinese, developed an algorithm for detecting faces. As a result, he miraculously found Asians, but could not see the Europeans.

    In each picture, we are first of all looking for familiar images. For example, we see squares and circles here.

    The eye is able to perceive very large ranges of brightness, but does it in a cunning way. The visual system adapts to a range of brightness values ​​of the order of 10 ^ 10. But at any given moment, we can recognize a small area of ​​brightness. That is, our eye selects a point for itself, adapts to the brightness value in it and recognizes only a small range around this point. Everything that is darker seems black, everything that is lighter is white. But the eye moves very quickly and the brain completes the picture, so we see well.

    Subjective brightness is the logarithm of physical brightness. If we look at a change in the brightness of a source and begin to change the brightness linearly, our eye will perceive it as a logarithm.

    Two types of components are responsible for visual perception - cones and rods. Cones are responsible for color perception and can very clearly perceive the picture, but in the event that it is not very dark. This is called photopic vision . In the dark, scotopic vision works - sticks are turned on , which are smaller than cones and which do not perceive color, so the picture is blurry.

    Also popular now: