
What will be the "Dialogue" of linguists and data analysis specialists

This year, Dialogue will have several key topics:
- The use of neural networks for language analysis . It is generally accepted that deep learning is the transformation of raw data into a result (the so-called end-to-end), in which it is rather difficult to interpret the “logic” of its receipt in meaningful linguistic concepts. But why not use neural networks to gain knowledge of the language itself?
- The use of more complex language models in deep learning . Another important trend for Dialogue: distribution models ( embeddings ) are clearly evolving from "medium- hospital " methods of obtaining - to the use of context, syntactic and semantic information.
- Application of big data analysis methods to tasks for which there is little data . 2019 is declared the International Year of Indigenous Languages , so participants in one of the Dialogue sessions will discuss methods of using machine learning to describe and preserve “low-resource” languages (for example, Evenki or Selkup).
- Multichannel corps : today there is a tendency to study a speech act in its entirety, including the verbal part, intonation, facial expressions, gestures. Such research is especially important when training robots, intelligent assistants, and chat bots.
Famous international experts in computer linguistics are traditionally invited to the Dialogue . This year the conference is attended by:
Chris Beeman of the University of Hamburg. One of the leading analysts in the field of computer semantics. He will talk about adaptive machine learning technologies that take into account individual experience. May 31 (Friday), 3 p.m. - 4 p.m.
Vossen Peakfrom Amsterdam Free University, founder and president of the Global WordNet Association. His main area of interest is the verbal interaction of a person and a computer. Peak Vossen will make a presentation on “A communicative robot that studies people and the world.” He will talk about a model of a robot that learns information about the world and its interlocutors through natural language communication. The robot learns everything that people tell him about, what it observes in different situations, and everything that it finds on the Internet. May 30 (Thursday), 3 p.m. - 4 p.m.
In total, “Dialogue” will present 102 reports of the main track and about 20 student ones. May 29, the first day of the conference, reports will be made by :
Andrey Kibrik, Director of the Institute of Linguistics, RAS. He will make a presentation on the new corpus methods created by his research group for fixing speech and gesture elements of communication. May 29 (Wednesday), 10: 30-11: 50.
Igor Boguslavsky , professor at the Technological University of Madrid, and his colleagues will talk about how a computer can be trained to correctly analyze what is known as “Vinograd schemes” is a new and more complex than the traditional Turing test, a way to evaluate the capabilities of artificial intelligence systems to understand the language. May 29, 12: 20-13: 30.
Valentina Apresyan , professor at the HSE School of Linguistics. Her talk is about implications.: not explicitly expressed, but meaning and assumptions derived from the text. The study of implications, especially false ones, allows, for example, to identify unfair publications in the media. May 29, 12: 20-13: 30.
There will be many interesting things on other days. By tradition, the Dialogue pays great attention to the new expressive capabilities of the language. For example , Maria Polinskaya from Harvard University and Irina Levontina from the Institute of OJ in their speech will analyze emotional expressions that have become popular, such as “We got to use the infinitive” (by the way, this is the name of the report. You can listen to it on May 30, 10: 00-13: 30 ) Antonina Laposhinafrom the Pushkin Institute, in his report “Make it Easy?” analyzes the lexical composition of Russian language textbooks for elementary schools - from the perspective of a modern corpus linguist (May 29, 3 p.m. to 6.30 p.m.).
Of course, a lot of work is devoted to the hot topic of the application of neural networks to the problems of language analysis. For example, on May 31, a special section of the Dialogue is devoted to such important areas of research as language models in deep learning, transfer learning, etc.
- On May 30, at 19:00, a round table will be held on the prospects of modeling a speech act in the interaction of a person with a computer. This direction is developing rapidly, and it is not easy for analytical multimodal linguistics to keep up with what modern methods of analyzing huge arrays of audiovisual information allow.
- May 31, at 19:00, we invite you to the round table “ Brave New DL Word: where is the place of NLP? ". The panelists will discuss the “provocative” thesis that NLP today is “dissolved” in deep machine learning technologies and is losing the status of an independent scientific discipline. Of course, many researchers will not agree with this statement, and we will expect exciting appearances by opponents.
One of the key events of the Dialogue is the summing up of technological competitions between the developers of the systems of linguistic analysis of texts Dialogue Evaluation . This year competitions were held in four tasks:
- automatic generation of news headlines;
- automatic analysis of low-resource languages (when there is very little data for machine learning);
- automatic resolution of anaphora and determination of reference chains (various references to the same object in the text),
- automatic recovery of words by context (some varieties of ellipsis).
To conduct such competitions, as usual, it was necessary to create specially prepared data (datasets) in order to train the tested algorithms. This is not the first time ABBYY technologies have been involved in creating such datasets for part of the competition for the analysis of texts in natural language . This allowed us to make the enclosures much larger due to the large amount of preprocessing done by the computer. In more detail we will tell about it soon on Habré. The results of Dialogue Evaluation will be summed up on the "Dialogue":
- May 30, 10: 00-13: 30, special session based on the results of testing automatic processing systems for the ellipsis mapping.
- May 31, 10: 00-13: 30, a special session based on the results of testing anaphora analysis systems and a special session based on the results of testing news headline generation systems
- June 1, 10: 00-13: 30, special session on the basis of testing systems for describing low-resource languages.
The working languages of the conference are Russian and English. A detailed conference program is available here .
The conference proceedings will be published in the yearbook “ Computer Linguistics and Intelligent Technologies ”, which is part of the international citation system Scopus .
You can register here , registration runs until May 28. Terms of participation .
Elizaveta Titarenko, editor of the corporate blog ABBYY
with the participation of Vladimir Selegey, director of linguistic research at ABBYY