12 cases on big date: confirmed examples from the industry when big date brings money
Habr, hello! We analyzed big data cases, in which big data technologies helped companies more efficiently work with customers or optimize internal processes. 
By the way, very soon the first set of the Big Data for Executives program will start, the purpose of which is to prepare the head or owner of the business to use the data in their activities. Read more about it here .
Industry: Subscription content - e-books.
Bookmate is a Russian service for reading e-books by subscription on mobile devices, has more than 3 million users around the world. Together with E-Contenta, the company managed to solve the problem of the "cold start" - recommendations for new users who have not yet selected any books in the application. To offer books to new users, a recommendation system was developed using external data - data from social networks and DMP (history of clicks, search queries on the Internet and other data on user behavior).
Result: the number of views of recommended books by new users increased 2.17 times, the conversion to paid users increased 1.4 times.

Sector: retail, online store.
BikeBerry.com - American online store of bicycles, motorcycles and spare parts and accessories. With RetentionSciencesophisticated machine learning algorithms and statistical models were introduced to track and predict consumer behavior. The technologies used allowed us to identify and use patterns of behavior on the site in the models, we also used data on purchase history, demographic and behavioral information. As a result, the store was able to recommend customers the most relevant products for them and make personalized discount offers only to customers who really needed them, which allowed them to increase profitability, more than double their sales and improve a number of other indicators.
Result: an increase in sales by 133%, an increase in user activity by 200%, a doubling of the number of customers making repeat purchases, an increase in the average check of such customers by 30%.
Branch: hotel business.
In the winter of 2014, the American hotel chain Red Roof Inn faced a decrease in the flow of tourists due to harsh winters and adverse weather conditions. However, due to such weather conditions, a large number of flights were canceled daily at airports, passengers stayed at airports for a long time and needed a hotel. Using open data on weather conditions and flight cancellations, the company was able to send passengers with delayed flights personalized offers with contact details of the nearest hotel network to the airport just when they were most in demand.
Result: additional revenue growth of 10% compared to the previous year, even in conditions of a reduced flow of tourists.
Branch: education.
Skillsoft is an American company developing educational software and content, one of the world leaders in the field of corporate educational programs. In partnership with IBM, the company used internal data on user interactions with the system, directly through the program and via e-mail newsletters, to personalize their experience, increase engagement and improve learning outcomes. Data on user behavior in the program was used to control engagement, to determine the best time and communication channel with which you can attract the user's attention. Also, based on the preferences of this and other users, a recommendation system of educational content was built (84% of users rated the recommendations as relevant),
Result: increased user engagement in content interaction by 128%.

Industry sector: Media, Journalism.
The Huffington Post is a popular American online publication, aggregator and blog that has many localized versions for various territories and languages. The company uses AB testing to select the best article headings, studies the behavior and preferences of the target audience in order to publish materials that are interesting to individual groups during the hours of their most activity (for example, materials for parents are published late in the evening on weekdays when the children have already fallen asleep). The company uses analysis of user behavior in the browser and recommendation systems to offer users the most interesting content and make it the most accessible and attractive starting from the main page of the site (Gravity technology).
Result: in August 2014, the threshold of 100 million unique visitors per month was exceeded, the first place in popularity in the United States among online publications was reached, the average number of viewed articles in one session increased to 10-12.
Industry: Content provision - films.
VidiMax is a Russian service that provides licensed access to feature and documentary films, TV shows, cartoons, sports, and television shows. Available via smart TV, has about 1 million users. To increase user loyalty during the free two-week trial use of the service, a recommendation system was introduced together with E-Contenta, a block of personal recommendations appeared.
Result: films in the personal recommendations block are watched 2.5 times more often than films in a selection of the most popular films.
Branch: banks.
Sberbank uses big data and machine learning in many areas, including credit scoring. To solve this problem, the company uses not only traditional data, such as socio-demographic parameters, credit history, transaction history, financial statements, but also a number of others. For credit scoring, Sberbank also uses customer relationship graphs built on the basis of data on money transfers and data on social networks. For credit scoring of companies news texts with their mention are used, for which an automatic analysis of tonality is carried out. In 2015, the company added the data of mobile operators to the model, which allowed to improve the quality of the classifier by 7 percentage points. by the Gini coefficient. A large number of active SIM cards and a short time of their work, small and numerous replenishment of accounts, Suspicious call geography indicates fraud and reduces the likelihood of approving a loan application. For retail customers, the use of machine learning algorithms has improved the quality of scoring models by 4 pp Gini coefficient due to more accurate selection of factors.
Result: a constant increase in the quality of scoring models, including due to the latest innovations.
Branch: transport.
Union Pacific Railroad is the largest railway company in the United States, has more than 8 thousand locomotives and owns the largest rail network in the United States. At the bottom of each company’s staff, thermometers, acoustic and visual sensors, and other sensors were installed. Data from them is transmitted to the processing center via fiber optic cables stretched along the railway network. The processing center also receives data on weather conditions, data on the status of brake and other systems, GPS-coordinates of trains. The data collected and the predictive models built on them make it possible to track the condition of the wheels and the railroad track and predict the derailment of the trains several days or even weeks before a possible incident. This time is enough to quickly fix the problems, to avoid damage to the train and delays of other trains.
Result: the company managed to reduce the number of derailments by 75% and avoid significant losses (previously losses from one derailment could reach $ 40 million).
Branch: public sector - police.
Using the solutions developed by PredPol , the Los Angeles police were able to obtain the most probable time and areas (with high accuracy, of the order of 50 sq. M) for the commission of various types of crimes and, to prevent them, send additional police forces there. The system uses historical data on the time, type and area of the crime, processes them using clustering algorithms in space and time. Predictive modeling is carried out using mathematical models of point processes ( Self-Exciting Point Process Modeling) At the same time, no personal data of people in the city and data on their whereabouts are used, which allows us to comply with privacy requirements. The decrease in the number of crimes has led to a reduction in costs in the police, the judiciary and the penal system.
Result: reduction in the number of thefts by 33%, reduction in the number of violent crimes by 21%.
Industry sector: Building Maintenance.
St. Vincent's is a large Australian network of public and private clinics located primarily in Sydney and Melbourne. Entro.py, a building management company, together with BuildingIQ, implemented a solution that analyzes current data on room use, temperature and weather conditions, as well as building characteristics and historical energy consumption data to reduce heating and cooling costs for buildings.
Result: in 2014, climate control costs decreased by 12%.
Branch: Logistics.
UPS is an American logistics company, the largest in the world in the delivery of parcels and supply chain management, delivers more than 16.9 million cargo per day in more than 220 countries. UPS uses big data to optimize routes, reduce fuel consumption and environmental pressure. The company uses radar to track cargo, collects and analyzes the performance of many sensors to monitor the condition of vehicles and driver behavior, and uses mobile CRM data to monitor delivery and the quality of customer service. To optimize routes and reduce costs, the company introduced the ORION system - one of the largest systems in the world based on the results of the mathematical theory of operations research. The construction of optimal routes is carried out in real time using huge computing power. To solve this problem, the system uses cartographic data, data on departure and arrival points, sizes and required delivery times.
Result: saving about 6 million liters of fuel per year, reducing carbon emissions into the atmosphere by 13 thousand tons per year, increasing the speed of delivery.

Branch: mechanical engineering.
ThyssenKrupp AG is one of the world's leading elevator manufacturers, serving more than 1.1 million elevators worldwide. In partnership with Microsoft, the company launched the MAX system, which collects data from a variety of sensors installed in the elevators of the company via the Internet of things (they monitor the speed of the cabin, the functioning of doors, the temperature of the engine, etc.) and build predictive models on them based on the Azure Machine Learning platform. Models allow you to prevent an incident before it occurs and pass the equipment a specific breakdown code, one of 400 possible, to reduce maintenance time. As a result, maintenance and repair costs are reduced (one break costs at least $ 300) and creates additional value for customers: elevators become more reliable, safer, owners of shops located in buildings,
Result: the uptime of elevators increased by an average of 50%.
Find out about our Big Data for Executives program here . And then a new set for the program "Big Data Specialist", and until November 15, a 15% discount.
By the way, very soon the first set of the Big Data for Executives program will start, the purpose of which is to prepare the head or owner of the business to use the data in their activities. Read more about it here .
Customer focus
1.Company: Bookmate.
Industry: Subscription content - e-books.
Bookmate is a Russian service for reading e-books by subscription on mobile devices, has more than 3 million users around the world. Together with E-Contenta, the company managed to solve the problem of the "cold start" - recommendations for new users who have not yet selected any books in the application. To offer books to new users, a recommendation system was developed using external data - data from social networks and DMP (history of clicks, search queries on the Internet and other data on user behavior).
Result: the number of views of recommended books by new users increased 2.17 times, the conversion to paid users increased 1.4 times.

2. Company: BikeBerry.
Sector: retail, online store.
BikeBerry.com - American online store of bicycles, motorcycles and spare parts and accessories. With RetentionSciencesophisticated machine learning algorithms and statistical models were introduced to track and predict consumer behavior. The technologies used allowed us to identify and use patterns of behavior on the site in the models, we also used data on purchase history, demographic and behavioral information. As a result, the store was able to recommend customers the most relevant products for them and make personalized discount offers only to customers who really needed them, which allowed them to increase profitability, more than double their sales and improve a number of other indicators.
Result: an increase in sales by 133%, an increase in user activity by 200%, a doubling of the number of customers making repeat purchases, an increase in the average check of such customers by 30%.
3. Company: Red Roof Inn.
Branch: hotel business.
In the winter of 2014, the American hotel chain Red Roof Inn faced a decrease in the flow of tourists due to harsh winters and adverse weather conditions. However, due to such weather conditions, a large number of flights were canceled daily at airports, passengers stayed at airports for a long time and needed a hotel. Using open data on weather conditions and flight cancellations, the company was able to send passengers with delayed flights personalized offers with contact details of the nearest hotel network to the airport just when they were most in demand.
Result: additional revenue growth of 10% compared to the previous year, even in conditions of a reduced flow of tourists.
4. Company: Skillsoft.
Branch: education.
Skillsoft is an American company developing educational software and content, one of the world leaders in the field of corporate educational programs. In partnership with IBM, the company used internal data on user interactions with the system, directly through the program and via e-mail newsletters, to personalize their experience, increase engagement and improve learning outcomes. Data on user behavior in the program was used to control engagement, to determine the best time and communication channel with which you can attract the user's attention. Also, based on the preferences of this and other users, a recommendation system of educational content was built (84% of users rated the recommendations as relevant),
Result: increased user engagement in content interaction by 128%.

5. Company: Huffington Post.
Industry sector: Media, Journalism.
The Huffington Post is a popular American online publication, aggregator and blog that has many localized versions for various territories and languages. The company uses AB testing to select the best article headings, studies the behavior and preferences of the target audience in order to publish materials that are interesting to individual groups during the hours of their most activity (for example, materials for parents are published late in the evening on weekdays when the children have already fallen asleep). The company uses analysis of user behavior in the browser and recommendation systems to offer users the most interesting content and make it the most accessible and attractive starting from the main page of the site (Gravity technology).
Result: in August 2014, the threshold of 100 million unique visitors per month was exceeded, the first place in popularity in the United States among online publications was reached, the average number of viewed articles in one session increased to 10-12.
6. Company: VidiMax.
Industry: Content provision - films.
VidiMax is a Russian service that provides licensed access to feature and documentary films, TV shows, cartoons, sports, and television shows. Available via smart TV, has about 1 million users. To increase user loyalty during the free two-week trial use of the service, a recommendation system was introduced together with E-Contenta, a block of personal recommendations appeared.
Result: films in the personal recommendations block are watched 2.5 times more often than films in a selection of the most popular films.
Internal optimization
1. Company: Sberbank.
Branch: banks.
Sberbank uses big data and machine learning in many areas, including credit scoring. To solve this problem, the company uses not only traditional data, such as socio-demographic parameters, credit history, transaction history, financial statements, but also a number of others. For credit scoring, Sberbank also uses customer relationship graphs built on the basis of data on money transfers and data on social networks. For credit scoring of companies news texts with their mention are used, for which an automatic analysis of tonality is carried out. In 2015, the company added the data of mobile operators to the model, which allowed to improve the quality of the classifier by 7 percentage points. by the Gini coefficient. A large number of active SIM cards and a short time of their work, small and numerous replenishment of accounts, Suspicious call geography indicates fraud and reduces the likelihood of approving a loan application. For retail customers, the use of machine learning algorithms has improved the quality of scoring models by 4 pp Gini coefficient due to more accurate selection of factors.
Result: a constant increase in the quality of scoring models, including due to the latest innovations.
2. Company: Union Pacific Railroad
Branch: transport.
Union Pacific Railroad is the largest railway company in the United States, has more than 8 thousand locomotives and owns the largest rail network in the United States. At the bottom of each company’s staff, thermometers, acoustic and visual sensors, and other sensors were installed. Data from them is transmitted to the processing center via fiber optic cables stretched along the railway network. The processing center also receives data on weather conditions, data on the status of brake and other systems, GPS-coordinates of trains. The data collected and the predictive models built on them make it possible to track the condition of the wheels and the railroad track and predict the derailment of the trains several days or even weeks before a possible incident. This time is enough to quickly fix the problems, to avoid damage to the train and delays of other trains.
Result: the company managed to reduce the number of derailments by 75% and avoid significant losses (previously losses from one derailment could reach $ 40 million).
3. Company: Los Angeles Police Department.
Branch: public sector - police.
Using the solutions developed by PredPol , the Los Angeles police were able to obtain the most probable time and areas (with high accuracy, of the order of 50 sq. M) for the commission of various types of crimes and, to prevent them, send additional police forces there. The system uses historical data on the time, type and area of the crime, processes them using clustering algorithms in space and time. Predictive modeling is carried out using mathematical models of point processes ( Self-Exciting Point Process Modeling) At the same time, no personal data of people in the city and data on their whereabouts are used, which allows us to comply with privacy requirements. The decrease in the number of crimes has led to a reduction in costs in the police, the judiciary and the penal system.
Result: reduction in the number of thefts by 33%, reduction in the number of violent crimes by 21%.
4. Company: Entro.py.
Industry sector: Building Maintenance.
St. Vincent's is a large Australian network of public and private clinics located primarily in Sydney and Melbourne. Entro.py, a building management company, together with BuildingIQ, implemented a solution that analyzes current data on room use, temperature and weather conditions, as well as building characteristics and historical energy consumption data to reduce heating and cooling costs for buildings.
Result: in 2014, climate control costs decreased by 12%.
5. Company: United Parcel Service (UPS).
Branch: Logistics.
UPS is an American logistics company, the largest in the world in the delivery of parcels and supply chain management, delivers more than 16.9 million cargo per day in more than 220 countries. UPS uses big data to optimize routes, reduce fuel consumption and environmental pressure. The company uses radar to track cargo, collects and analyzes the performance of many sensors to monitor the condition of vehicles and driver behavior, and uses mobile CRM data to monitor delivery and the quality of customer service. To optimize routes and reduce costs, the company introduced the ORION system - one of the largest systems in the world based on the results of the mathematical theory of operations research. The construction of optimal routes is carried out in real time using huge computing power. To solve this problem, the system uses cartographic data, data on departure and arrival points, sizes and required delivery times.
Result: saving about 6 million liters of fuel per year, reducing carbon emissions into the atmosphere by 13 thousand tons per year, increasing the speed of delivery.

6. Company: ThyssenKrupp AG.
Branch: mechanical engineering.
ThyssenKrupp AG is one of the world's leading elevator manufacturers, serving more than 1.1 million elevators worldwide. In partnership with Microsoft, the company launched the MAX system, which collects data from a variety of sensors installed in the elevators of the company via the Internet of things (they monitor the speed of the cabin, the functioning of doors, the temperature of the engine, etc.) and build predictive models on them based on the Azure Machine Learning platform. Models allow you to prevent an incident before it occurs and pass the equipment a specific breakdown code, one of 400 possible, to reduce maintenance time. As a result, maintenance and repair costs are reduced (one break costs at least $ 300) and creates additional value for customers: elevators become more reliable, safer, owners of shops located in buildings,
Result: the uptime of elevators increased by an average of 50%.
Find out about our Big Data for Executives program here . And then a new set for the program "Big Data Specialist", and until November 15, a 15% discount.