Familiar Faces: Algorithms for Creating a “Typical” Portrait
Posted by Andrei Sorokin, Senior Developer DataArt
At the end of last year, we completed an R&D project dedicated to machine vision methods in image processing. As a result, we created a number of averaged portraits of IT specialists working with different technologies. In this article I will talk about images of “typical” Java and .NET programmers, suitable for this framework and process optimization.
I have been interested in the subject of machine vision since graduate school - my Ph.D. was devoted to the recognition of handwritten texts. Over the past few years, there have been significant changes in the methodology and software for machine vision, there are new tools and frameworks that I wanted to try. In this project, we did not claim to invent a unique solution - we made the main contribution to the optimization of image processing.
The idea of creating a portrait of an “average representative”, of course, is not unique. For example, in October, the author of Reddit osmutiar posted medium portraits of a professional baseball player, basketball player, golfer, etc. on a resource.
In this picture, faces created by osmitar based on 1800 portraits of American MLB players and the 500 best acting football players in the world.
Four years ago, a study of female and male attractiveness was widely discussed , during which scientists modeled the averaged faces of representatives of different nationalities.
In fact, the portraits in the illustration at the end underwent additional artistic processing.
The main research of ours was the photographs of colleagues, which we could group according to their appearance and formal characteristics related to their professional competencies.
The resulting portrait of an “average” male colleague from DataArt.
In total, we analyzed 1,541 male and 512 female individuals taken from our internal time tracking system. The first problems we encountered were the small size of photographs - only 80 by 120 pixels - and the lack of a standard for shooting. The rotation and tilt of the head in the photo was different for everyone, initially the program detected faces on only 927 male and 85 female portraits. Therefore, the first thing was to normalize the situation of persons.
Photos before and after leveling the head.
After increasing the size and interpolation points earned on the image detector, based on the method of histograms of oriented gradients (Histogram of Oriented Gradients - HOG) .
To merge the faces pre-processed by our algorithm, we used the method proposed by Satya Mullik , a researcher of Indian origin who works at the University of California at San Diego. We identified 68 key points on each face in the sample: the coordinates of the corners of the eyes, eyebrows, lips, nose. Then they triangulated each face by these key points, that is, divided it into triangles.
This is how face patterns look after triangulation.
This is what a real portrait looks like after transformation in accordance with the average face model.
And finally, for all faces in the sample, the colors of pixels inside the corresponding triangles were averaged.
Additionally, it was interesting to look at the clustering of source images. To isolate groups of faces, we used spectral analysis of image descriptors, with a sample of N main components. The matrix of features (MxN), where M is the number of samples, N is the number of components of feature vectors, undergoes SVD decomposition. The largest eigenvalues corresponding to their eigenvectors are selected, and the remaining samples are divided into these top N clusters (“close” to the centers of the clusters defined by eigenvectors). In other words, the five most dissimilar groups are selected from the presented samples. Then, three images are selected from each group. Thus, we get a contrasting average face through the use of fewer samples, however, all clusters in the resulting image are presented. As a result, we got a number of typical portraits selected by the algorithm. Images did not turn out to be “faceless” or too similar to each other. With the simple merging of a sufficiently large number of images, this would be almost inevitable.
results
Java girl | Java boy | .NET Girl | .NET Boy | |
---|---|---|---|---|
Portraits obtained using cluster analysis (by merging the top 3 faces in each of the five clusters). | ||||
Images taken by simply merging portraits of all .NET and Java programmers. |
Most of the manipulations with the image (working with image points, affine transformations and working with color) use the algorithms implemented in the opencv framework . To highlight the key points of faces, as well as the faces themselves, the dlib framework and a previously trained model for assessing the position of the face (here we acted as users of the Davis King model) are used in the image , for example, shape_predictor_68_face_landmarks.dat , trained on the iBUG 300W image collection . That is, we simply submit an image of 150 x 150 pixels to the model’s input and get 68 points at the output - a vector of fixed length.
Satya Mullik implemented the final part in Python, but we basically rewrote it in C ++. This allowed us to increase the processing speed, reduce the amount of memory consumed and ensure the integrity of the solution.
Another problem was the large memory consumption (> 4GB) for merging already 300 images. We solved it by analyzing the code for merging proposed by Mullik - all the source images were read simultaneously before the merger. This could not suit us: in our case, all we had to read was 1541 files. If the sizes of the images were slightly larger, 32 GB would not be enough. This problem was solved by rewriting a piece of code and incremental merging of the next read image. Now the amount of memory used does not exceed 100 MB (only the average coordinates of the “key points” of faces are stored, one processed image and loaded classifiers - fHOG and loaded models).