Big Data Week Moscow margin notes
In continuation to our previous post with presentations from Big Data Week Moscow, we collected several statements by Russian and international speakers, which we especially remembered and seemed worthy of attention.
I recorded these words by ear, so I ask you to forgive in advance possible inaccuracies. Also, if you think one of the statements is “button accordion,” write off in the comments - it’s interesting to know where the legs grow from!
1. “Knowledge of the subject area does not particularly help the data scientist in his work”
Mikhail Levin, Chief Data Scientist, Yandex Data Factory
Context: Data Factory- the sensational Yandex project, which was presented last December at the Paris Le Web conference. Yandex Data Factory is developing as a startup aimed at the international market. Yandex Data Factory developers create large business-oriented products based on Big Data. Among the Russian pilots called, for example, Sberbank.
Why it’s interesting: Traditionally, data scientists claim that knowledge of the specifics of the subject area determines up to 50% of the success of using machine learning. And Mikhail Levin puts a “black box” at the forefront, which seeks correlations between various parameters without taking into account the physical meaning of certain values.
2. “The evolution of the Hadoop ecosystem repeats the evolution of Linux.”
Joseph Curto, Data Scientist, Professor, IE Business School Madrid
Context: IE Business School is one of the 20 best business schools in the world. Recently, they had a Master's program in Big Data, and they began to collect expertise in this area. Joseph Curto is the director of Delfos Research and a data scientist who specializes in implementing data analysis techniques in various business areas.
Why interesting: Comparing Hadoop and Linux at first seems unexpected, but essentially productive. It implies both the potential extent of the Hadoop distribution and also refutes the predictions of “Hadoop death” (for example, in the context of Hadoop vs. Spark). Curto speaks of Hadoop as a paradigm and predicts this ecosystem not death, but development. By the way, the contrast between Hadoop and Spark is not correct, it is more accurate to compare Spark and Hadoop Map-Reduce.
3.“Beeline made a strategic decision to focus on the development of Big Data for external customers, and not for internal optimization tasks.”
Aleksanr Krot, Leading Data Scientist, VimpelCom
Context: Vimpelcom (the company that owns the Beeline brand) has been successfully developing the Big business for a long time Data for solving internal problems. Moreover, there are as many as two divisions in Vimpelcom that work with big data - the management information department and a special laboratory for Data Science. In autumn 2013, a new CEO Mikhail Slobodin came to Vimpelcom, with the advent of which the big changes in the telecom strategy are associated.
Why it’s interesting: Vimpelcom has one of the strongest Big Data teams in Russia (among those that are not part of large Internet companies). About the "traditional" (that is, not the Internet) business, it is generally accepted that Big Data helps them primarily to increase revenues from their main activities - to find new customers, raise a check, solve security issues and stop fraud. The transition to a new strategy, in which Beeline will earn on data by providing services to external customers (we are not talking about providing data to subscribers, this has been clearly and several times indicated in the company). The decision is connected with the arrival of the new CEO Mikhail Slobodin. The Russian telecom market has long gone through an explosive growth phase,
4. “The conversion of Internet advertising companies can be increased by about 20% if you adjust them taking into account the psycho-segmentation of the audience”
Kirill Chistov, Development Director, Data-Centric Alliance
Context: Data-Centric Alliance is a Russian company specializing in working with Big Data and highly loaded systems. The company's developments lie in the field of digital marketing - from programmatic purchases for online advertising, to technological integrations with databases of client companies.
Why it’s interesting: Having data on user behavior on the Internet, you can target advertising campaigns based on their location, gender and age. Having slightly complicated the task of the analyst, you can also learn a lot about the intentions and preferences of a person - what he read and looked at, where he was resting, which car he was driving. But today, this is not enough for many marketers.
At DCA, learn to divide the audience by psycho-types (rational / irrational, extraverts / introverts and anxious). “Psychotyping” is a complex analytical process that requires the use of machine learning and human resources.
When a brand understands the nature of the consumer, he can adapt not only the meaning of the message, but also the form of presentation, which greatly increases the conversion. They shared this case study from DCA in their practice: in the “anti-aging cosmetics” category, targeting women with anxiety about age-related changes (the “anxious” psychotype) increased the influx of target visitors to the promo site 2.5 times, despite the fact that each such the visit began to cost the advertiser 60% cheaper.
Exact targeting of advertising campaigns is now becoming more and more popular. In March, Sberbank bought RuTarget, which is the developer of the Segmento advertising platform, a service that uses artificial intelligence and big data processing technologies for highly targeted advertising targeting.
5.“The use of Big Data technologies for the analysis of social networks does not have indisputable business applications, and to this extent it is the R&D task”
Alexey Natekin, Director of Data Mining Labs
Context: Data Mining Labs is engaged in data mining, training students, design development and research in areas of data analysis theory.
Why it is interesting: The ability to use open sources of information is one of the advantages of working with big data. In connection with social networks, they often mention the tasks of optimizing advertising and credit scoring, but these cases rely more on social “features” for an external task, says Natekin.
PS Big Data Week Moscow was organized by the Laboratory of New Professions and Digital October Center.