Moscow Data Science Major: announcement and registration
On September 1, Mail.Ru Group and the Open Data Science community will hold the largest Moscow Data Science Major meeting. The event consists of five thematic blocks of reports, one ML-training and a whole hall for networking and dating.
Meet the program and register ! Entry to the event is free, according to the approved registration.
Reports at the Moscow Data Science Major will be held in two streams. In the table you will find the grid with the schedule, and below - the description of the reports.
“Speaker Diarization Problem”, Gregory Sterling, NeurodataLab LLC
I will briefly talk about speech processing as a whole and about the speaker diarization task (by recording the dialogue you need to determine who spoke and when). I'll tell you about the history of the problem, why, why, about the cocktail party problem, who decided how it is difficult. The main part of the report will be devoted to the results of 2017-2018, for example, about a Google article that describes how to solve a problem for a video (where the neural network seems to be trying to read lips). I will end with what they do when there is no video, but there is only sound (a dialogue on the phone, for example), walk through the articles and our approach.
“Neuronet vocoders”, Sergey Dukanov, Mail.Ru Group
First, there will be a small insight into modern approaches to solving the problem of speech synthesis, then we will talk about vocoders, and then we will focus on one of the most interesting of them (both in terms of theory and practice).
“Pizza a la semi-supervised”, Arthur Kuzin, Dbrain
On the example of product control in “Dodo Pizza” I will talk about the methods of working with data when teaching models. In particular, I will show how the boxes are stretched onto the semantic segmentation of objects, as well as how to train the model and get the markup of the dataset by marking out only a few samples.
“OCR and TD Architecture in Recognizing Photos of Printed Documents”, Alexey Goncharov, Ilya Zharikov, Philip Nikitin, MIPT Machine Intelligence Laboratory
The report describes the structure of OCR (character recognition) and TD (detection of windows with text), which our team uses in projects for the recognition of photographs of printed documents of various types. Let's talk about both the architecture and the training of these systems.
“How to do domain adaptation, and ideas to improve its quality”, Renat Bashirov, Samsung AI The
report is a squeeze of ideas from a couple of dozen articles. Articles were selected according to the degree of utility for domain adaptation for images: having one marked set, how to get / improve markup on another similar set.
- many gan'ov,
- several architectures with a dozen loss functions
- told about
- that such different things can be given as a loss function,
- style transfer
- application domain adaptation for different tasks: classification, segmentation.
Do not think that nothing will be clear if you understand, for example:
- what is the loss function
- how backprop works
- why batchnorm is needed and how it works,
- What is the size of the tensor obtained after the global average pooling.
“Search by product - organization of work”, Dmitry Dremov, Analysis of checks
About the task, approach to the organization of work and results.
“Showcases in a social network: how and what to show”, Sergey Boytsov, Odnoklassniki
Let's go all the way from the user to a specific element in the showcase that he sees. Data collection, preprocessing, analytical processing, A / B testing.
“Recommender systems for transport tickets”, Artem Prosvetov and Konstantin Kotochigov, CleverDATA
The report describes the use of recommender systems in an unusual area: for the sale of transport tickets. What traditional approaches can help in solving this problem, which heuristics show themselves well and what discoveries we have made for ourselves while doing this project.
Tuning Jupyter Notebook, Alexander Lifanov, MarketGuard
How to set up a Jupyter Notebook for productive and convenient work.
“BigArtm is not just for text”, Maxim Statsenko, Mail.Ru Group
Many people are used to embedding as a text: we make an embedding of words, sentences, etc. In a sense, thematic modeling is embedding too. In my report, I want to show that with the help of Python and ingenuity, it is possible to use the approaches of thematic modeling and embeddings in tasks in which there are no texts at all, namely in the clustering of users by sources of earnings and by interests.
“PID Controller intro, or How to brew beer with PyData”, Anton Lebedevich
A gradual introduction to the most popular automatic controller on the example of mashing malt for beer, with animation and Python code. In addition to the basic PID controller, there will be a couple of tricks that improve his work in real life. In practice, automatic regulation is often needed, and almost any of its implementation contains PID elements along with their flaws, which you need to be aware of and be able to repair.
area Networking and dating area. In this room you will be able to communicate with colleagues and other participants of the event in a free format.
To participate you must register . Do not forget your passport or driver's license.
Collection of participants and registration : 10:00 - 11:00.
Beginning of presentations : 11:00.
Approximate end of the event : 17:00.
Address : Moscow, m. Airport, Leningradsky Prospect, 39, p.79.