Neural network trained to recognize depression by arbitrary speech of a person without context


    The neural network evaluates the emotional coloring of the 30-second fragment of the speaker’s speech. The illustration from the previous scientific work of the authors

    In recent years, machine learning is increasingly used as a useful diagnostic tool. Existing models are capable of identifying words and intonations of speech that may indicate depression. But these models usually work only if the patient answers specific questions of the doctor: for example, about his mood, lifestyle, medical history, etc. That is, the work of the neural network in this case is no different from the work of an ordinary psychotherapist who talks with the patient.

    But for a new generation of medicine, a system that determines depression on an arbitrary set of words is much more effectivewithout a specific set of questions. Theoretically, in this case, you can automatically monitor the mental health of the entire population in real time (all voice traffic) - and quickly hospitalize patients. The automatic depression detection module can be implemented in mobile applications and games.

    This model was developed by scientists from the Massachusetts Institute of Technology, writes the publication MIT News . The scientific article will be presented at the Interspeech 2018 conference , which will be held September 2-6 in India.

    “If you want to deploy models of [detecting depression] in a scalable way ... then you need to minimize the number of restrictions on the data used. A model should extract data from any ordinary conversation and natural interaction between people, ”said Tuka Alhanai, a researcher at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL), a leading author of scientific work.

    Researchers hope the new method will be used to detect signs of depression in natural conversation. For example, based on the model, mobile applications can be developed that track the user's text and voice for mental disorders and send alerts. This is especially useful for those who cannot get to the doctor for an initial diagnosis due to the absence of a doctor, the high cost of consultation, or simply because they don’t know that he has a mental problem.

    Depression is a very dangerous mental illness, which is accompanied by a decrease in self-esteem, loss of interest in life and habitual activity. In some cases, a person suffering from it may begin to abuse alcohol or other substances.

    The key innovation of the new technology lies in its ability to detect patterns that indicate depression, and then compare these patterns with new people without additional information, that is, without prior training on a specific person. “We call it work“ without context ”because you do not impose any restrictions on the types of questions you are looking for and the type of answers to these questions,” Alkhanay explains.

    To train the neural network, a technique called “sequence modeling”, which is often used for speech processing, was used. The model learns from sequences of text and sound data from questions and answers from people with and without depression. Gradually, she reveals general patterns, as some words are associated with different sounds in healthy and sick people. In addition, people with depression can speak more slowly and use longer pauses between words. These text and sound identifiers for mental disorders have been studied in previous studies. Ultimately, the model itself determines whether there are signs of depression in the speech or not.

    The model was tested on a data set of 142 speech fragments from the Distress Analysis Interview Corpus corpus (sound, text, video). The accuracy of diagnosis was 71% (i.e. 29% of false-positive results), and the completeness of detection of the disease was 83% of all patients in the sample. In most tests, accuracy exceeded the performance of all previous models for diagnosing depression. Researchers find the preliminary results are very encouraging.

    In a previous scientific article from 2017, the authors described a neural network that recognizes the speaker’s mood by the following signs:

    • voice characteristics;
    • a set of words;
    • pulse.


    The illustration shows the distribution of emotional content over five-second intervals. Negative segments are those that show signs of sadness, disgust, anger, fear, or boredom. Positive segments contain signs of happiness, interest, or enthusiasm.

    In addition to depression, scientists intend to train the neural network to recognize other mental states, such as dementia.

    Also popular now: