Data Science Week 2016. Third and Fourth Day Overview
Habr, hello! We are publishing a review of the third and fourth days of Data Science Week 2016 , namely it was Sberbank Data Day and the day dedicated to the topic of artificial intelligence.

On the third day of Data Science Week, they mainly talked about the experience of Sberbank solving specific problems using big data technologies, but some of the presentations were of a general conceptual nature.
The speakers announced the desire of Sberbank to become a data-driven organization - a flexible structure in which business processes change and decisions are made in response to changes in incoming data. Due to this, Sberbank expects to gain a competitive advantage in the speed of launching new solutions demanded by customers on the market.
Sberbank has created an effective infrastructure for storing and processing big data, based on Hadoop, Spark and NoSQL solutions.
The main focus in the collection and use of data in Sberbank is done on clients, "combine data around the client." To solve business problems, the company analyzes a wide range of internal and external data.
Based on the internal data of customer profiles and applications, the history of transactions and the use of bank services, advanced customer profiles are built. Clients are segmented by socio-demographic parameters, needs, preferences, in order to understand which proposals they will be interested in, through which channels it is better to work with them.
Credit scoring uses not only traditional data, such as socio-demographic parameters, credit history, transaction history, financial statements, but also a number of others. For example, a company uses data from mobile operators, both in credit scoring and to detect fraud. The tendency to fraud is indicated by a large number of active SIM cards and a short time of their work, small and numerous replenishment of accounts, geography of calls. Also for scoring tasks, graphs of customer connections are used, which are built on the basis of data on money transfers and data on social networks. For credit scoring of companies news texts with their mention are used, for which an automatic analysis of tonality is carried out.
At present, the company’s underwriting process (in terms of making decisions on the basic categories) is largely automated. The restructuring of the scoring card is also automated, although the expert decides whether to accept the automatically redesigned scoring card or not.
Alexander Kulikov from companies Segmentotalked about how analyzing the sequence of transactions and payment patterns allows the company to identify important events in the life of customers (for example, spending a large amount on treatment or buying a car) and predict what transactions the client is likely to make in the near future, in which categories. This allows customers to make the most relevant offers. Analysis of customer data and their behavior allows you to generate offers of pre-approved loans and offer them to customers exactly when they are most in demand.
Search query data is used to personalize the display of the Sberbank website. For example, if a client is interested in tourism, he will be offered insurance for those traveling abroad.
The company also applies image analysis by deep learning methods. Some time ago, SAFI was introduced in Sberbank - a system for analyzing photographs to prevent document fraud and customer identification. As a result, losses from this type of fraud were reduced by 10 times.
A separate presentation was devoted to the risks of using models. Here, the speaker identified three main areas of risk: data, models and processes. Risks in the data are associated with their inconsistency, incompleteness, non-representativeness, and the presence of emissions. If you do not notice and fix these problems in the data, the cost of the error will be very high. Regarding models and their application, errors are possible due to the illegality of the assumptions taken, attempts to blindly transfer the model developed for one subject area to another, as well as with the human factor (fraud, conflict of interest within the organization). To limit model risk, the company uses user feedback, clear standards for modeling and data preparation, procedures for testing models for their applicability.
The last speech on this day was dedicated to the eToro social trading platform , with which Sberbank began active cooperation. This system is built on the principle of a social network, aggregates and displays in an accessible form the data received from successful traders of the system - analytics, transaction history. Successful traders automatically form analogues of trust funds. Based on the user's profile, his experience and attitude to risk, the leverage available to him changes, an automated offer of suitable assets and traders is made, the behavior of which can be copied. The purpose of this platform is to provide simple and understandable access to financial markets to everyone, including Sberbank customers who want to manage their assets through it.
The last day of Data Science Week was dedicated to artificial intelligence. In the broad sense, little has been said about artificial intelligence, mainly about the prospects for using chatbots and personal assistants.

Just this topic was devoted to performances by Konstantin Savenkov of companies Inten.to . According to the speaker, a number of trends indicate the rapid development of this area in the future.
Firstly, people now spend more time in messengers than on social networks, and business wants to go to their customers, including through this channel. One solution here could be to use bots.
Secondly, almost all the largest companies developing messengers create platforms for the work of bots and personal assistants, although almost no one uses them yet. Huge investments are being made in this direction. Connector services appear that allow you to run a once-written bot on different platforms.
Finally, the API market is growing, so now personal assistants have something to manage.
Speaking about the prospects of using bots and assistants, the speaker noted that attempts to replace convenient graphical interfaces with the bot do not lead to anything, they only complicate the process (for example, when ordering airline tickets). However, when the interaction is based on a limited input of information, as when communicating with people, chatbots can be effective (examples: concierge, fulfillment of assignments, legal services). Intelligent applications will help users to avoid mistakes, provide advice in choosing, making decisions (as a waiter).
According to the speaker, today in this area the most promising paradigm is a personal assistant who uses sophisticated technologies for understanding speech and the message context, but provides a simple service. Understanding speech and context is followed by a decision-making phase. For example, it can be the selection of wine for a dish according to its ingredients. Next, the service platform comes into play, which is used to fulfill the user's order.
Today, the methods for executing specific orders are usually prescribed manually by the company or selected by crowdsourcing. Inten.to sees its place on the market in creating a tool for automatic selection by the personal assistant of the necessary APIs for solving tasks.
Evgeny Legky, representing the companySegmento spoke about the role of artificial intelligence in the development of technology and about the main trends that can help to avoid a drop in labor productivity in the future. According to the speaker, in the future the sphere of human labor will change greatly. The economy sector will expand upon request (examples: Uber, GetTaxi) when we order and receive a service when we need it. Freelance will expand, more and more people will be engaged in some other projects along with the main work. Flexible teams will be created for certain projects, and a workforce order will become popular. More and more people will begin to perform small tasks (microtasking), and micro-productivity will increase in these small operations. Finally, technologies based on artificial intelligence will enter our lives.

Speech by NVIDIA spokesman Anton Dzhoraev was not dedicated to artificial intelligence itself, but to hardware and computing platforms for implementing deep learning, which is widely used in this field.
Today, neural networks, for example, Baidu Deep Speech 2, are already equaled in human speech recognition quality. However, this was achieved at the cost of repeatedly complicating the calculations and increasing the amount of data used. At the same time, the use of such technologies in applications requires quick response - the user will not wait too long. Therefore, NVIDIA has focused on creating software and hardware that generate a strategy for executing an already trained neural network and provide high performance. The company has developed its own analogue of the TensorFlow framework used in deep learning, which is designed for use with specific hardware and therefore works faster and can do logical optimizations. Riftman
CompanyThe last spokesperson in the Xor system plans to use bots in hiring IT staff. The system analyzes examples of code posted by developers on GitHub, StackOverflow and other resources, and thus finds specialists with the necessary skills. The system uses similar mechanisms for resume validation. Further, communication with the candidate is carried out using the bot, regardless of whether he is looking for work now or not.
According to Nikolai Manolov, a very large number of specialists have already outgrown their current position and are waiting for interesting offers, but in fact they are falling out of sight of HR specialists. It’s easier to contact a person through a bot: the letter will be sent to spam, and a call can cause a negative reaction. If the candidate does not like the proposal, the bot collects feedback from him in order to further improve the selection model, to understand what conditions need to be offered and to whom. Also, the bot will be able to schedule interviews, send test items. Thus, almost all processes in this area can be automated.
»All presentations are available here.
»Access to video speeches can be obtained here.
Day 3
On the third day of Data Science Week, they mainly talked about the experience of Sberbank solving specific problems using big data technologies, but some of the presentations were of a general conceptual nature.
The speakers announced the desire of Sberbank to become a data-driven organization - a flexible structure in which business processes change and decisions are made in response to changes in incoming data. Due to this, Sberbank expects to gain a competitive advantage in the speed of launching new solutions demanded by customers on the market.
Sberbank has created an effective infrastructure for storing and processing big data, based on Hadoop, Spark and NoSQL solutions.
The main focus in the collection and use of data in Sberbank is done on clients, "combine data around the client." To solve business problems, the company analyzes a wide range of internal and external data.
Based on the internal data of customer profiles and applications, the history of transactions and the use of bank services, advanced customer profiles are built. Clients are segmented by socio-demographic parameters, needs, preferences, in order to understand which proposals they will be interested in, through which channels it is better to work with them.
Credit scoring uses not only traditional data, such as socio-demographic parameters, credit history, transaction history, financial statements, but also a number of others. For example, a company uses data from mobile operators, both in credit scoring and to detect fraud. The tendency to fraud is indicated by a large number of active SIM cards and a short time of their work, small and numerous replenishment of accounts, geography of calls. Also for scoring tasks, graphs of customer connections are used, which are built on the basis of data on money transfers and data on social networks. For credit scoring of companies news texts with their mention are used, for which an automatic analysis of tonality is carried out.
At present, the company’s underwriting process (in terms of making decisions on the basic categories) is largely automated. The restructuring of the scoring card is also automated, although the expert decides whether to accept the automatically redesigned scoring card or not.
Alexander Kulikov from companies Segmentotalked about how analyzing the sequence of transactions and payment patterns allows the company to identify important events in the life of customers (for example, spending a large amount on treatment or buying a car) and predict what transactions the client is likely to make in the near future, in which categories. This allows customers to make the most relevant offers. Analysis of customer data and their behavior allows you to generate offers of pre-approved loans and offer them to customers exactly when they are most in demand.
Search query data is used to personalize the display of the Sberbank website. For example, if a client is interested in tourism, he will be offered insurance for those traveling abroad.
The company also applies image analysis by deep learning methods. Some time ago, SAFI was introduced in Sberbank - a system for analyzing photographs to prevent document fraud and customer identification. As a result, losses from this type of fraud were reduced by 10 times.
A separate presentation was devoted to the risks of using models. Here, the speaker identified three main areas of risk: data, models and processes. Risks in the data are associated with their inconsistency, incompleteness, non-representativeness, and the presence of emissions. If you do not notice and fix these problems in the data, the cost of the error will be very high. Regarding models and their application, errors are possible due to the illegality of the assumptions taken, attempts to blindly transfer the model developed for one subject area to another, as well as with the human factor (fraud, conflict of interest within the organization). To limit model risk, the company uses user feedback, clear standards for modeling and data preparation, procedures for testing models for their applicability.
The last speech on this day was dedicated to the eToro social trading platform , with which Sberbank began active cooperation. This system is built on the principle of a social network, aggregates and displays in an accessible form the data received from successful traders of the system - analytics, transaction history. Successful traders automatically form analogues of trust funds. Based on the user's profile, his experience and attitude to risk, the leverage available to him changes, an automated offer of suitable assets and traders is made, the behavior of which can be copied. The purpose of this platform is to provide simple and understandable access to financial markets to everyone, including Sberbank customers who want to manage their assets through it.
Day 4
The last day of Data Science Week was dedicated to artificial intelligence. In the broad sense, little has been said about artificial intelligence, mainly about the prospects for using chatbots and personal assistants.
Just this topic was devoted to performances by Konstantin Savenkov of companies Inten.to . According to the speaker, a number of trends indicate the rapid development of this area in the future.
Firstly, people now spend more time in messengers than on social networks, and business wants to go to their customers, including through this channel. One solution here could be to use bots.
Secondly, almost all the largest companies developing messengers create platforms for the work of bots and personal assistants, although almost no one uses them yet. Huge investments are being made in this direction. Connector services appear that allow you to run a once-written bot on different platforms.
Finally, the API market is growing, so now personal assistants have something to manage.
Speaking about the prospects of using bots and assistants, the speaker noted that attempts to replace convenient graphical interfaces with the bot do not lead to anything, they only complicate the process (for example, when ordering airline tickets). However, when the interaction is based on a limited input of information, as when communicating with people, chatbots can be effective (examples: concierge, fulfillment of assignments, legal services). Intelligent applications will help users to avoid mistakes, provide advice in choosing, making decisions (as a waiter).
According to the speaker, today in this area the most promising paradigm is a personal assistant who uses sophisticated technologies for understanding speech and the message context, but provides a simple service. Understanding speech and context is followed by a decision-making phase. For example, it can be the selection of wine for a dish according to its ingredients. Next, the service platform comes into play, which is used to fulfill the user's order.
Today, the methods for executing specific orders are usually prescribed manually by the company or selected by crowdsourcing. Inten.to sees its place on the market in creating a tool for automatic selection by the personal assistant of the necessary APIs for solving tasks.
Evgeny Legky, representing the companySegmento spoke about the role of artificial intelligence in the development of technology and about the main trends that can help to avoid a drop in labor productivity in the future. According to the speaker, in the future the sphere of human labor will change greatly. The economy sector will expand upon request (examples: Uber, GetTaxi) when we order and receive a service when we need it. Freelance will expand, more and more people will be engaged in some other projects along with the main work. Flexible teams will be created for certain projects, and a workforce order will become popular. More and more people will begin to perform small tasks (microtasking), and micro-productivity will increase in these small operations. Finally, technologies based on artificial intelligence will enter our lives.

Speech by NVIDIA spokesman Anton Dzhoraev was not dedicated to artificial intelligence itself, but to hardware and computing platforms for implementing deep learning, which is widely used in this field.
Today, neural networks, for example, Baidu Deep Speech 2, are already equaled in human speech recognition quality. However, this was achieved at the cost of repeatedly complicating the calculations and increasing the amount of data used. At the same time, the use of such technologies in applications requires quick response - the user will not wait too long. Therefore, NVIDIA has focused on creating software and hardware that generate a strategy for executing an already trained neural network and provide high performance. The company has developed its own analogue of the TensorFlow framework used in deep learning, which is designed for use with specific hardware and therefore works faster and can do logical optimizations. Riftman
CompanyThe last spokesperson in the Xor system plans to use bots in hiring IT staff. The system analyzes examples of code posted by developers on GitHub, StackOverflow and other resources, and thus finds specialists with the necessary skills. The system uses similar mechanisms for resume validation. Further, communication with the candidate is carried out using the bot, regardless of whether he is looking for work now or not.
According to Nikolai Manolov, a very large number of specialists have already outgrown their current position and are waiting for interesting offers, but in fact they are falling out of sight of HR specialists. It’s easier to contact a person through a bot: the letter will be sent to spam, and a call can cause a negative reaction. If the candidate does not like the proposal, the bot collects feedback from him in order to further improve the selection model, to understand what conditions need to be offered and to whom. Also, the bot will be able to schedule interviews, send test items. Thus, almost all processes in this area can be automated.
»All presentations are available here.
»Access to video speeches can be obtained here.