Determining the dominant features of classification and developing a mathematical model of facial expressions


    1. Search and analysis of the color space optimal for constructing prominent objects on a given class of images
    2. Determination of the dominant signs of classification and the development of a mathematical model of facial expressions "
    3. Synthesis of an optimal recognition algorithm for facial expressions
    4. Implementation and testing of the facial recognition algorithm
    5. Creation of a test database lip images of users in various states to increase the accuracy of the system
    6. Search for the optimal audio speech recognition system based on open source passcode
    7.Search for an optimal audio recognition system for speech recognition with closed source code, but with open APIs, for integration
    8. Experiment of integrating a video extension into an audio recognition system with speech test protocol


    Identify the dominant features of the classification of a localization object and develop a mathematical model for the task of analyzing facial expressions.


    Search and analysis of face localization methods, determination of the dominant features of classification, development of a mathematical model optimal for the task of recognizing facial expressions.


    In addition to determining the optimal color space for constructing prominent objects on a given image class, which was carried out at the previous stage of the study, the determination of the dominant features of classification and the development of a mathematical model of facial expressions also play an important role.

    To solve this problem, it is necessary, first of all, to set the system features for modifying the face detection problem with a video camera, and then localize the movement of the lips.


    As for the first task, two varieties should be distinguished:
    • Face localization;
    • Face tracking [1].
    Since we are faced with the task of developing an algorithm for recognizing facial expressions, it is logical to assume that this system will be used by one user who will not be too active in moving his head. Therefore, for the implementation of lip movement recognition technology, it is necessary to take as a basis a simplified version of the detection task, where one and only one face is present on the image.

    And this means that a face search can be carried out relatively rarely (about 10 frames / sec. And even less). At the same time, the movements of the speaker’s lips during the conversation are quite active, and, therefore, their contour should be evaluated with greater intensity.

    The task of finding a face in an image can be solved by existing means. Today there are several methods for detecting and localizing a face in an image, which can be divided into 2 categories:
    1. Empirical recognition;
    2. Modeling facial images. [2].

    The first category includes top-down recognition methods based on invariant features of facial images, based on the assumption that there are some signs of the presence of faces in the image that are invariant relative to the shooting conditions. These methods can be divided into 2 subcategories:
    1.1. Detection of elements and features that are characteristic of a face image (edges, brightness, color, the characteristic shape of facial features, etc.) [3], [4] .;
    1.2. Analysis of the discovered features, making a decision on the number and location of faces (empirical algorithm, statistics of the relative position of signs, modeling of the processes of visual images, the use of rigid and deformable patterns, etc.) [5], [6].

    For the algorithm to work correctly, it is necessary to create a database of facial features with subsequent testing. For a more accurate implementation of empirical methods, models can be used that allow you to take into account the possibilities of face transformation, and, therefore, have either an expanded set of basic data for recognition, or a mechanism that allows you to model the transformation on the basic elements. Difficulties with building a classifier database focused on a wide variety of users with individual features, facial features and so on, contributes to a decrease in the recognition accuracy of this method.

    The second category includes methods of mathematical statistics and machine learning. The methods of this category rely on image recognition tools, considering the face detection task as a special case of the recognition task. The image is placed with a certain feature vector, which is used to classify images into two classes: face / not face. The most common way to obtain a feature vector is to use the image itself: each pixel becomes a component of the vector, turning the n × m image into a space vector R ^ (n × m), where n and m are positive integers. [7]. The disadvantage of this representation is the extremely high dimension of the feature space. The advantage of this method is the exclusion from the whole procedure of constructing a classifier of human participation, as well as the ability to train the system for a specific user. Therefore, the use of image modeling methods to build a mathematical model of face localization is optimal for solving our problem.

    As for segmenting the face profile and tracking the position of the lip points by the sequence of frames, mathematical methods of modeling should also be used to solve this problem. There are several ways to determine the movement of facial expressions, the most famous of which are the use of a mathematical model based on active contour models:

    Localization of facial expressions based on a mathematical model of active contour models

    An active circuit (snake) is a deforming model, the template of which is given in the form of a parametric curve, manually initialized by a set of control points lying on an open or closed curve in the input image.

    To adapt the active contour to the image of facial expressions, it is necessary to conduct the corresponding binarization of the object under study, that is, its conversion into a variety of digital raster images, and then an appropriate assessment of the parameters of the active contour and the calculation of the feature vector should be carried out.


    The active contour model is defined as:
    • The set of points N;
    • Internal areas of energy of interest (internal elastic energy term);
    • External areas of energy of interest (external edge based energy term).

    To improve the quality of recognition, two color classes are distinguished - skin and lips. The color class membership function has a value in the range from 0 to 1.

    The equation of the active contour model (snake) is represented by the expressed formula v (s) as:
    Where E is the energy of the snake (active contour model). The first two terms describe the regularity energy of the active contour model (snake). In our polar coordinate system, v (s) = [r (s), θ (s)], s is from 0 to 1. The third term is the energy related to the external force obtained from the image, the fourth with pressure.

    External force is determined based on the above characteristics. She is able to shift the control points to a certain value of intensity. It is calculated as:
    The gradient factor (derivative) is calculated at the snake's points along the corresponding radial line. Strength increases if the gradient is negative and decreases otherwise. The coefficient before the gradient is a weight factor depending on the image topology. The compressive force is just a constant, ½ of the minimum weight is used. The best form of a snake is obtained by minimizing the energy functional after a certain number of iterations.

    Consider the basic operations of image processing in more detail. For simplicity, suppose that we have already somehow identified the area of ​​the speaker’s mouth. In this case, the main operations for processing the received image, which we need to perform, are presented in Fig. 3.



    In order to determine the dominant signs of image classification during the course of the research work, the peculiarities of the modification of the face detection problem with a video camera were revealed. Among all the methods of face localization and detection of the studied area of ​​facial expressions, the most suitable for the creation of a universal recognition system for mobile devices are face image modeling methods.
    The development of a mathematical model of facial expressions is based on a system of active contour binarization models of the object under study. Since this mathematical model allows, after changing the color space from RGB to the YCbCr color model, to effectively convert the object of interest, for its subsequent analysis based on active contour models and to identify clear boundaries of facial expressions after the corresponding iterations of the image.

    List of sources used

    1. Vezhnevets V., Diagtereva A. Detection and localization of the face in the image. CGM Journal, 2003
    2. Ibid.
    3. E. Hjelmas and BK Low, Face detection: A survey, Journal of Computer vision and image understanding, vol. 83, pp. 236-274, 2001.
    4. G. Yang and TS Huang, Human face detection in complex background, Pattern recognition, vol. 27, no.1, pp. 53-63, 1994
    5. K. Sobottka and I. Pitas, A novel method for automatic face segmentation, facial feature extraction and tracking, Signal processing: Image communication, Vol. 12, No. 3, pp. 263-281, June, 1998
    6. F. Smeraldi, O. Cormona, and J. Big.un., Saccadic search with Gabor features applied to eye detection and real-time head tracking, Image Vision Comput. 18, pp. 323-329,200
    7. Gomozov A.A., Kryukov A.F. Analysis of empirical and mathematical algorithms for recognizing a human face. Network-journal. Moscow Power Engineering Institute (Technical University). No.1 (18), 2011

    To be continued

    Also popular now: