Accurate to the hundredth: top 10 reports SmartData 2017

SmartData conference viewers are people who love working with data. It must be assumed that they gave very thoughtful assessment to the reports after last year’s conference.

And now, according to these estimates, we have compiled the top 10 videos. And at the same time, in order to please the data lovers, we indicated for each of the ten reports all the accompanying numbers: a place in the top, an exact audience rating, the number of viewers.

Generally speaking, often the neighboring positions in the top ratings differ slightly. So, perhaps, one should not attach much importance to “who follows whom” - it is more important that all these reports received high marks. But on the other hand, how can it not give a lot of attention to numbers when it is so exciting!

Neurona: why did we teach the neural network to write poems in the spirit of Kurt Cobain?

Speaker: Ivan Yamshchikov
Location: 1
Rating: 4.51 ± 0.08
Spectators: ~ 200
Presentation of the report

The keynote from the creator of the projects “Neural Defense” and Neurona became the clear leader of the conference. This is an accessible performance that does not require a gigantic preparation from the viewer - but at the same time it is not just a hundred-thousandth explanation of “how neural networks work”. It seems to be an “entertaining” format (it is unlikely that what he heard will immediately affect your working draft) - but in the long run, this can be not only very interesting, but also useful. In general, is it any wonder that we invited Ivan to participate in the upcoming SmartData 2018.

From click to forecast and back: Data Science pipelines in Odnoklassniki

Speaker: Dmitry Bugaychenko
Location: 2
Rating: 4.36 ± 0.08
Number of viewers: ~ 140
Presentation of the report

And here is the opposite. First of all, this is not a general “that can give us machine learning”, but specifics “how exactly we implement everything”. And the report is not about ML itself (the personalization of the news feed is given merely as an example), but about all the accompanying ones: “what needs to be done for all this ML beauty to work”. In general, if the performance of Yamshchikov may interest even a wide audience, here it will be interesting only to be personally involved with machine learning, but they can take a lot for themselves.

CatBoost - the next generation of gradient boosting

Speaker: Anna Veronika Dearush
Place: 3
Rating: 4.32 ± 0.12
Spectators: ~ 100
Presentation of the report

If the gradient boosting is not your specialty, and the subject of the report caused a feeling that “there are probably nuances for those who are already doing it” , dispel fears. The report is friendly to beginners and does not immediately dive into the pool with his head, but first explains the basic things. And given that over the past year, the CatBoost library from Yandex has become more beautiful and popular than before, it’s useful to have an idea about it, even if you don’t have to deal with it right now - and the report can be a good introduction.

Back to the future of the modern banking system

Speaker: Vladimir Krasilshchik
Location: 4
Rating: 4.31 ± 0.17
Number of viewers: ~ 80
Presentation of the report

What if, due to the eventual consistency, you have quarterly report data that differ from monthly, and auditors with regulators have any questions? Vladimir Krasilshchik explains that the key concept here is bitemporality: there is “when the event happened”, and there is “when the system found out about it”, it is necessary to work with both of these scales and demonstrate to third-party verifiers both at once. This report is not limited to, there are still a lot of things - for example, did you think that at the IT conference you would hear the phrase “there is no justice, and you should not try to create it”?

Name is a feature

Speaker: Vitaliy Khudobakhshov
Location: 5
Rating: 4.28 ± 0.08
Number of spectators: ~ 280
Presentation of the report

The most paradoxical conference presentation, forcing you to wonder in the back of your head. On the one hand, it is completely obvious to any reasonable person: there are no noticeable reasons for the correlation of a person’s name (if it’s about popular Russian names) and whether this person will be in a relationship. On the other hand, Vitaly presents data showing the opposite. He himself has no exact explanation, but no one has any really convincing objections. You can try to search for yourself.

No data? No problems! CGI Deep Learning

Speaker: Ivan Drokin
Place: 6
Rating: 4.26 ± 0.18
Spectators: ~ 40
Presentation of the report

As you know, there are not enough algorithms for in-depth training - initial data are needed for training. As a result, a good data set has become a valuable resource. But what to do if you don’t have it now, and you aren’t Google and can’t invest huge resources? It turns out that it is not always necessary to take “real” data from the real world, and under certain conditions they can be literally generated. The report deals with a specific case of this kind.

Deep convolutional networks for object detection and image segmentation

Speaker: Sergey Nikolenko
Place: 7
Rating: 4.24 ± 0.17
Number of viewers: ~ 80
Presentation of the report

If you are still far from machine / deep learning in general, the first 20 minutes of this report may well suit: there is a thorough introduction to the topic with a historical excursion starting from the 50s. And if you understand everything as a whole, but you don’t understand specifically in the subtopic of deep convolutional networks, then you can skip the intro and pay attention to the second half of the report, where they go to convoluted neural networks.

Hadoop high availability: Badoo experience

Speaker: Alexander Krashennikov
Place: 8
Rating: 4.22 ± 0.14
Viewers: ~ 100
Presentation of the report

It seems that, in addition to the concept of “big data,” more “growing data” would be useful, because growth dictates its own specifics. Once Badoo had orders of magnitude smaller amounts of data and one approach to them, then the volumes increased and changes were needed - and it must be borne in mind that tomorrow everything could grow even stronger, doing everything “with a margin”.

The company became interested in the combination of “Hadoop” and “realtime” even when they usually wrote “incompatible” between these two words, and now they told about their experience with Hadoop and ensuring high availability in his case. Bonus: a little creativity Vasily Lozhkina on the slides.

We segment 600 million users in real time every day.

Speaker: Artyom Marinov
Location: 9
Rating: 4.21 ± 0.09
Spectators: ~ 120
Presentation of the report

Here the project is very different from Badoo: not dating, but DMP (data management platform), where you need to distinguish among the audience segments like “housewives with a car over five years old. " But, first, there are also large scales (about one hundred thousand events per second). And secondly, here you need to be even more ready for growth: “among the data sources there are pixel installations, if suddenly a superpopular website puts your pixel to itself - there will be a huge flow with which it will be necessary to cope”. What technologies are used and how they are used? Answers in the report.

Distributed ML on big data: the experience of building a recommendation system in ivi

Speaker: Boris Shminke
Location: 10
Rating: 4.21 ± 0.09
Spectators: ~ 100
Presentation of the report

Finally, the last report is also about infrastructure, not algorithms, and also based on the experience of a large product. Once ivi began to implement recommendations from the use of a third-party service, which provided recommendations-as-a-service. Then they “grew up” from it and started to make their own system. On Habré, the company wrote about it back in 2014, and from the report you can find out about the current state of affairs.

If you are interested in these reports, please note: SmartData 2018 will take place this fall . Some speakers from this top 10 will return with new reports, there will be completely new names. The most up-to-date information about the program can always be seen on the site , there you can also buy tickets - and their price is gradually increasing, so it’s worth considering now.

Tags: