12 more big data cases
Habr, hello. Today we have prepared 12 more examples of how big data technologies bring money to companies.
Branch: telecom.
Beeline has a huge set of data about its subscribers, which the company plans to use not only for internal optimization and work with customers (increasing sales, customer retention, combating fraud), but also for introducing new analytical products to the market (providing data for credit scoring targeted digital advertising, creating geoanalytical reports, IPTV analytics, external consulting). The company has implemented many big data projects. For example, a subscriber base was segmented based on an expanded client profile, including gender and age prediction, and construction of social graphs; implemented a project to recognize and protect subscribers from money fraud and viral activities; subscribers who use communication services on several types of devices were allocated, as well as subscribers who are at the airport and flying abroad to offer them suitable services and tariff plans. And this is not a complete list. The company uses HDFS and Apache Spark for data storage and processing, rapidminer and python for data analysis, including the scikit-learn library.
Result: by 2018, projected revenues from big data will amount to more than 20% of the company's revenue.
Industry sector: Entertainment, Gambling.
Caesars Entertainment, one of the leaders in the US gambling business, operates the famous Caesar's Palace casino in Las Vegas and more than 50 gambling establishments around the world. The company itself has recently been fined more than $ 20 million for money laundering, and its unit, managing the casino, was subjected to bankruptcy. However, over 17 years, the company has accumulated an impressive amount of data and analytics on its Total Rewards loyalty program, which is now one of its most valuable assets and has been valued at $ 1 billion. In the gaming industry, data on each client is important: some players of the highest, seventh category according to the classification of the company spent annually about $ 500 thousand in a casino, and one businessman in the base left $ 200 million in Las Vegas for the year. It’s very important to be able to meet such a client by name, spend on your favorite game and offer free additional options to your taste: entertainment, airline tickets, limousine rides, hotel accommodations, etc. With the help of accumulated data about the tastes and behavior of customers and advanced methods of data analysis, the company could find such an approach to customers that they left as much money as possible and were sure to return again. The company uses Hadoop and cloud solutions, processing more than 3 million records per hour. Data analysis is also used to segment customers and improve security standards. so that they leave as much money as possible and are sure to return again. The company uses Hadoop and cloud solutions, processing more than 3 million records per hour. Data analysis is also used to segment customers and improve security standards. so that they leave as much money as possible and are sure to return again. The company uses Hadoop and cloud solutions, processing more than 3 million records per hour. Data analysis is also used to segment customers and improve security standards.
Result: the most valuable individual asset of the company worth more than $ 1 billion was created, the growth of profitability and safety standards was achieved.
Branch: dating sites.
eHarmony is a dating site focused on building long-term relationships. In the questionnaire on the site, the user can leave very detailed information about himself, specify more than 1000 parameters. Further, using this data, as well as the history of sympathies and the development of relationships between site users, the system recommends the most suitable dating options, and not just based on the similarity of interests and beliefs. The user photos are analyzed, complementing each other and coinciding, but at the same time causing characteristics between people. For example, it was found out that vegetarians are more likely to form a stable couple than hamburger lovers, and the face area in the photo in a certain way affects mutual attractiveness.
The company also analyzes the effectiveness of the channels of marketing companies, uses personalized advertising, manages user loyalty, and counteracts the outflow of customers.
In the analysis of data, the company uses SPSS solutions, recommendation systems, Hadoop, Hive and cloud technologies.
Result: Every day, the system makes about 100 million assumptions that two people can suit each other, $ 10 million is saved annually by counteracting the outflow of customers and by reducing inefficient marketing costs.
Branch: chemical industry, production of paints and varnishes.
Nippon Paint is a Japanese company, ranked 7th in the world by turnover among paint companies, and the leading company in the Chinese market. Using the iColor website launched by the company, which allows you to test different colors of paint on real interiors, has gained wide popularity among both private clients and among designers and decoration companies. Analysis of the data obtained using this site allows the company to track new trends in colors and design in order to develop new products and plan production. Also, with the help of this platform, the company interacts with designers and design companies to promote the company's products through them, segment consumers and create personalized offers for them. For data processing and analysis, the company uses solutions based on SAP HANA and Hadoop.

Result:the company received a powerful tool for identifying and tracking market trends, which allows you to plan demand and develop new products, as well as a platform for solving a number of other tasks of targeted interaction with customers.
Branch: healthcare.
In the United States, sepsis is ranked 10th in the ranking of causes of death among diseases. About 1 million Americans develop sepsis each year, between 28% and 50% of them die. About $ 20 billion is spent annually on the treatment of sepsis.
Moreover, the main cause of deaths is insufficient medical supervision. Patients are discharged from the hospital or receive first aid, and after that they are not monitored. However, after this there is a high risk of developing sepsis, the symptoms of which - fever, chills, rapid breathing and pulse, rash, confusion and disorientation - are similar to the symptoms of other common diseases. Often patients go to the doctor too late or the disease cannot be correctly diagnosed in the early stages. As a result, septic shock quickly develops and often irreversible damage to many organs occurs.
To monitor the condition of patients, it is proposed to use a certified HealthPatch device from Vital Connect, which will collect basic indicators of the patient's condition, including even postures and movements (they change with sepsis). Further, the information goes to ClearStory Data servers, where it is combined with other medical data about patients and analyzed in real time using an Apache Spark-based solution. In the future, such devices will be available to all patients who leave hospitals and receive first aid for conditions that can be followed by sepsis. A similar system, but with a lower level of data analysis, has already been successfully implemented in Singapore.
Result: created a solution that will allow the US health system to significantly reduce mortality from sepsis (general blood poisoning).
Branch: food delivery b2b.
JJ Food Service is one of the largest British b2b food delivery companies with more than 60 thousand customers in the form of cafes, restaurants, school and office canteens, etc. In 2010, the company accepted almost all orders through call centers, today 60% of orders are accepted through the Internet portal. This increased the efficiency of work, but led to the loss of personal contact with the client. By telephone, the client was offered to purchase more expensive or complementary goods and services, informed him of trends in his market segment. These capabilities needed to be realized at a new level with the help of big data technologies. To solve these problems, the company turned to Microsoft specialists to build a solution based on Azure Machine Learning cloud services. Recommendations generated by predictive models built on this platform, Now they are used not only on the Internet portal, but also by the employees of the call center. When a client calls the call center or logs into the site, his basket is already filled based on the purchase history and recommendations (recipes, similar orders of other users are taken into account, new items are added that customers do not yet know about). About 80% of these products, buyers really leave in the basket and purchase. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. his basket is already filled on the basis of the purchase history and recommendations (recipes, similar orders of other users are taken into account, new items are added that customers do not yet know about). About 80% of these products, buyers really leave in the basket and purchase. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. his basket is already filled on the basis of the purchase history and recommendations (recipes, similar orders of other users are taken into account, new items are added that customers do not yet know about). About 80% of these products, buyers really leave in the basket and purchase. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months.

Result: 80% of the goods in pre-filled consumer baskets are acquired by customers, sales growth, speed of service and customer satisfaction.
Branch: banking.
In the previous post we already described some cases of Big Data application in Sberbank, in this we will talk about another case - AS SAFI.
This photo analysis system for identifying customers and preventing document fraud was developed and implemented in Sberbank by the beginning of 2014. The system is based on comparing photos from the database with images received by web cameras on racks using computer vision technologies. As a result, losses from this type of fraud were reduced by 10 times.
The basis of the AU was the Cascade-Search biometric platform from the Technoserv company. Initially, this system was developed for use in operational, reference and expert work, but was adapted for the needs of Sberbank and integrated with an automated loan application review system. The system works very quickly: thanks to a number of innovative solutions, such as In-Memory Processing, comparing camera images and images in the database takes only a few seconds.
Result: losses from fraud with documents of individuals decreased by 10 times.
Branch: agriculture.
FarmLogs is a company that provides big data analytics and convenient services for farmers to plan and optimize their work. The company’s mobile and web applications are already used by more than a third of US farmers. FarmLogs services use open geodata about soil type, detailed data on weather conditions, precipitation and solar activity. The analysis of satellite images is also widely used to automatically determine the crops of various crops, monitor their condition and take into account historical data for forecasting and generating recommendations. The company leases to farmers devices that are installed in agricultural combines and automatically records data on all operations, routes and fuel consumption in the system.
Result: the farmer sector of the US economy, relatively far from high technology, is covered by big data optimization by more than a third.
Branch : transport - airports.
Dubai International Airport is the world's busiest airport in terms of international passenger traffic, one of the largest in the world. Big data on airport performance, flights and passenger movements are widely used to optimize airport operations and increase passenger satisfaction.
The airport uses sophisticated optimization algorithms to dynamically assign exits for boarding and arriving. In particular, if two flights have a large number of passengers changing from one flight to another, their exits will be assigned nearby.
A lot of passengers visit Dubai for shopping, and the airport has many shops in the duty free area. However, the passion for shopping leads to the fact that passengers are often late for flights. Many of them speak neither English nor Arabic - in the languages in which the announcements are made. After the introduction of a new notification program, passenger boarding passes are scanned in each airport store, and they receive notifications about which exit they need to go along, which route, and how long it will take, in the language they speak.
Result: the destination and departure arrivals have been optimized, the number of delays on flights has been significantly reduced.
Industry sector: Retail, Clothing, Shoes, and Accessories.
Macy's is a large chain of department stores, founded in 1858 and today has more than 840 stores in 45 US states. During the year, at least 1 time, chain stores are visited by 70% of Americans.
The company analyzes large volumes of data on demand, stocks, lack of goods in specific stores, combines these data with the preferences of customers living in this territory, and thus optimizes the assortment of all categories of goods in each of the outlets.
Using SAS Institute technology, the retailer makes price adjustments for 73 million goods in almost real time, using data on demand and available inventory.
The company uses big data not only in terms of internal optimization, but also in terms of customer focus. For example, the Macys.com online store, like other online retailers, uses personalization, targeted advertising banners and e-mail newsletters, optimization of the site’s search engine so that the buyer can easily find the product he is interested in. Personalization of advertising messages and offers in the company is very high, the number of unique variations of one mailing list can reach 500,000.
Result: high sales growth rates (up to 50% per year), at least 10% of which, according to the company, is the net effect of big data .
Industry sector: Databank, Help System.
The main asset of Ancestry is a huge database of modern and historical documents that allows you to restore family ties between people and build family trees. Today, the company’s data bank already contains more than 5 billion profiles of people who lived at different times (starting from the 16th century) in a large number of countries; more than 45 million genealogical trees were constructed that set family ties between them.
Historical data is generally not presented in a machine readable format. These may be, for example, handwritten entries in books. In addition, such data may be inconsistent, inaccurate and incomplete. The data mining and machine learning algorithms, including fuzzy matching algorithms, help in the processing, addition and verification of company data.
Big data technologies are used to store and analyze data. Ancestry data is processed on three MapR clusters (a Hadoop-based distribution). The first compares with the samples in the database the results of DNA analysis (there are already more than 120 thousand samples in the database) that users can get for just $ 99 by spitting in the tube and sending a sample of saliva to the company by mail. The second implements machine learning algorithms, the third - data mining.
A preliminary analysis of the database allows you to simplify the search for relatives and form assumptions that facilitate the investigation of family history. Today, a huge number of discoveries made by users is provided by preliminary connection of profiles and ranking of results carried out by the system based on the analysis of profile data and the previous search history in the system. A few years ago, all such discoveries were the merit of the system users only. The user experience in the system is also constantly analyzed to identify stages where users are having difficulty in adding additional content to these areas or ranking search results.

Result:A huge database of historical data has been accumulated with cleared, confirmed and pre-connected records, providing easy and efficient use of the system.
Industry sector: Mechanical Engineering, Engine Manufacturing.
Rolls-Royce Holdings is a British multinational company producing engines and turbines for the aerospace industry, marine vessels and energy. The engines that they produce are very large and expensive, miscalculations and errors in their production can cost millions and lead to deaths. Rolls-Royce uses big data technology to design, manufacture and further support its after-sales engines.
When developing engines, computer simulation is widely used, producing terabytes of data. Analysis and visualization of this data is performed on high-performance computing clusters. Company production systems interact with each other through the Internet of Things.
Rolls-Royce engines are equipped with hundreds of sensors that record the smallest details of their work. This data is quickly processed using machine learning algorithms and transmitted to engineers in case of deviations.
The company does not give an accurate assessment of the effectiveness of the implementation of big data technologies, however, according to the company’s management, they gave a significant cost reduction. Big data also changed the company's business model: thanks to it, the company was able to offer customers a new casing model, “Total Care”, when companies pay hourly monitoring of the state of the engines during their operation.
Result: a significant reduction in development and production costs, increased reliability, the introduction of a new business model.
We welcome your comments.
And we are waiting for you at the “Big Data Specialist” program , starting March 15th.
Customer focus
1. Company: Beeline.
Branch: telecom.
Beeline has a huge set of data about its subscribers, which the company plans to use not only for internal optimization and work with customers (increasing sales, customer retention, combating fraud), but also for introducing new analytical products to the market (providing data for credit scoring targeted digital advertising, creating geoanalytical reports, IPTV analytics, external consulting). The company has implemented many big data projects. For example, a subscriber base was segmented based on an expanded client profile, including gender and age prediction, and construction of social graphs; implemented a project to recognize and protect subscribers from money fraud and viral activities; subscribers who use communication services on several types of devices were allocated, as well as subscribers who are at the airport and flying abroad to offer them suitable services and tariff plans. And this is not a complete list. The company uses HDFS and Apache Spark for data storage and processing, rapidminer and python for data analysis, including the scikit-learn library.
Result: by 2018, projected revenues from big data will amount to more than 20% of the company's revenue.
2. Company: Caesars Entertainment.
Industry sector: Entertainment, Gambling.
Caesars Entertainment, one of the leaders in the US gambling business, operates the famous Caesar's Palace casino in Las Vegas and more than 50 gambling establishments around the world. The company itself has recently been fined more than $ 20 million for money laundering, and its unit, managing the casino, was subjected to bankruptcy. However, over 17 years, the company has accumulated an impressive amount of data and analytics on its Total Rewards loyalty program, which is now one of its most valuable assets and has been valued at $ 1 billion. In the gaming industry, data on each client is important: some players of the highest, seventh category according to the classification of the company spent annually about $ 500 thousand in a casino, and one businessman in the base left $ 200 million in Las Vegas for the year. It’s very important to be able to meet such a client by name, spend on your favorite game and offer free additional options to your taste: entertainment, airline tickets, limousine rides, hotel accommodations, etc. With the help of accumulated data about the tastes and behavior of customers and advanced methods of data analysis, the company could find such an approach to customers that they left as much money as possible and were sure to return again. The company uses Hadoop and cloud solutions, processing more than 3 million records per hour. Data analysis is also used to segment customers and improve security standards. so that they leave as much money as possible and are sure to return again. The company uses Hadoop and cloud solutions, processing more than 3 million records per hour. Data analysis is also used to segment customers and improve security standards. so that they leave as much money as possible and are sure to return again. The company uses Hadoop and cloud solutions, processing more than 3 million records per hour. Data analysis is also used to segment customers and improve security standards.
Result: the most valuable individual asset of the company worth more than $ 1 billion was created, the growth of profitability and safety standards was achieved.
3.Company: eHarmony.
Branch: dating sites.
eHarmony is a dating site focused on building long-term relationships. In the questionnaire on the site, the user can leave very detailed information about himself, specify more than 1000 parameters. Further, using this data, as well as the history of sympathies and the development of relationships between site users, the system recommends the most suitable dating options, and not just based on the similarity of interests and beliefs. The user photos are analyzed, complementing each other and coinciding, but at the same time causing characteristics between people. For example, it was found out that vegetarians are more likely to form a stable couple than hamburger lovers, and the face area in the photo in a certain way affects mutual attractiveness.
The company also analyzes the effectiveness of the channels of marketing companies, uses personalized advertising, manages user loyalty, and counteracts the outflow of customers.
In the analysis of data, the company uses SPSS solutions, recommendation systems, Hadoop, Hive and cloud technologies.
Result: Every day, the system makes about 100 million assumptions that two people can suit each other, $ 10 million is saved annually by counteracting the outflow of customers and by reducing inefficient marketing costs.
4. Company: Nippon Paint.
Branch: chemical industry, production of paints and varnishes.
Nippon Paint is a Japanese company, ranked 7th in the world by turnover among paint companies, and the leading company in the Chinese market. Using the iColor website launched by the company, which allows you to test different colors of paint on real interiors, has gained wide popularity among both private clients and among designers and decoration companies. Analysis of the data obtained using this site allows the company to track new trends in colors and design in order to develop new products and plan production. Also, with the help of this platform, the company interacts with designers and design companies to promote the company's products through them, segment consumers and create personalized offers for them. For data processing and analysis, the company uses solutions based on SAP HANA and Hadoop.

Result:the company received a powerful tool for identifying and tracking market trends, which allows you to plan demand and develop new products, as well as a platform for solving a number of other tasks of targeted interaction with customers.
5. Company: Hitachi Consulting, Vital Connect, ClearStory Data.
Branch: healthcare.
In the United States, sepsis is ranked 10th in the ranking of causes of death among diseases. About 1 million Americans develop sepsis each year, between 28% and 50% of them die. About $ 20 billion is spent annually on the treatment of sepsis.
Moreover, the main cause of deaths is insufficient medical supervision. Patients are discharged from the hospital or receive first aid, and after that they are not monitored. However, after this there is a high risk of developing sepsis, the symptoms of which - fever, chills, rapid breathing and pulse, rash, confusion and disorientation - are similar to the symptoms of other common diseases. Often patients go to the doctor too late or the disease cannot be correctly diagnosed in the early stages. As a result, septic shock quickly develops and often irreversible damage to many organs occurs.
To monitor the condition of patients, it is proposed to use a certified HealthPatch device from Vital Connect, which will collect basic indicators of the patient's condition, including even postures and movements (they change with sepsis). Further, the information goes to ClearStory Data servers, where it is combined with other medical data about patients and analyzed in real time using an Apache Spark-based solution. In the future, such devices will be available to all patients who leave hospitals and receive first aid for conditions that can be followed by sepsis. A similar system, but with a lower level of data analysis, has already been successfully implemented in Singapore.
Result: created a solution that will allow the US health system to significantly reduce mortality from sepsis (general blood poisoning).
6. Company: JJ Food Service.
Branch: food delivery b2b.
JJ Food Service is one of the largest British b2b food delivery companies with more than 60 thousand customers in the form of cafes, restaurants, school and office canteens, etc. In 2010, the company accepted almost all orders through call centers, today 60% of orders are accepted through the Internet portal. This increased the efficiency of work, but led to the loss of personal contact with the client. By telephone, the client was offered to purchase more expensive or complementary goods and services, informed him of trends in his market segment. These capabilities needed to be realized at a new level with the help of big data technologies. To solve these problems, the company turned to Microsoft specialists to build a solution based on Azure Machine Learning cloud services. Recommendations generated by predictive models built on this platform, Now they are used not only on the Internet portal, but also by the employees of the call center. When a client calls the call center or logs into the site, his basket is already filled based on the purchase history and recommendations (recipes, similar orders of other users are taken into account, new items are added that customers do not yet know about). About 80% of these products, buyers really leave in the basket and purchase. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. his basket is already filled on the basis of the purchase history and recommendations (recipes, similar orders of other users are taken into account, new items are added that customers do not yet know about). About 80% of these products, buyers really leave in the basket and purchase. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. his basket is already filled on the basis of the purchase history and recommendations (recipes, similar orders of other users are taken into account, new items are added that customers do not yet know about). About 80% of these products, buyers really leave in the basket and purchase. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months. Such a solution is possible in the b2b industry, as often you need fairly regular deliveries of a certain set of products. Immediately before placing the order, the type of institution and the recipes used by it are taken into account to determine if the client has forgotten to buy something necessary. The implementation of the system took 3 months.

Result: 80% of the goods in pre-filled consumer baskets are acquired by customers, sales growth, speed of service and customer satisfaction.
Internal optimization
1. Company: Sberbank.
Branch: banking.
In the previous post we already described some cases of Big Data application in Sberbank, in this we will talk about another case - AS SAFI.
This photo analysis system for identifying customers and preventing document fraud was developed and implemented in Sberbank by the beginning of 2014. The system is based on comparing photos from the database with images received by web cameras on racks using computer vision technologies. As a result, losses from this type of fraud were reduced by 10 times.
The basis of the AU was the Cascade-Search biometric platform from the Technoserv company. Initially, this system was developed for use in operational, reference and expert work, but was adapted for the needs of Sberbank and integrated with an automated loan application review system. The system works very quickly: thanks to a number of innovative solutions, such as In-Memory Processing, comparing camera images and images in the database takes only a few seconds.
Result: losses from fraud with documents of individuals decreased by 10 times.
2. Company: FarmLogs.
Branch: agriculture.
FarmLogs is a company that provides big data analytics and convenient services for farmers to plan and optimize their work. The company’s mobile and web applications are already used by more than a third of US farmers. FarmLogs services use open geodata about soil type, detailed data on weather conditions, precipitation and solar activity. The analysis of satellite images is also widely used to automatically determine the crops of various crops, monitor their condition and take into account historical data for forecasting and generating recommendations. The company leases to farmers devices that are installed in agricultural combines and automatically records data on all operations, routes and fuel consumption in the system.
Result: the farmer sector of the US economy, relatively far from high technology, is covered by big data optimization by more than a third.
3. Company: Dubai Airports.
Branch : transport - airports.
Dubai International Airport is the world's busiest airport in terms of international passenger traffic, one of the largest in the world. Big data on airport performance, flights and passenger movements are widely used to optimize airport operations and increase passenger satisfaction.
The airport uses sophisticated optimization algorithms to dynamically assign exits for boarding and arriving. In particular, if two flights have a large number of passengers changing from one flight to another, their exits will be assigned nearby.
A lot of passengers visit Dubai for shopping, and the airport has many shops in the duty free area. However, the passion for shopping leads to the fact that passengers are often late for flights. Many of them speak neither English nor Arabic - in the languages in which the announcements are made. After the introduction of a new notification program, passenger boarding passes are scanned in each airport store, and they receive notifications about which exit they need to go along, which route, and how long it will take, in the language they speak.
Result: the destination and departure arrivals have been optimized, the number of delays on flights has been significantly reduced.
4. Company: Macy's.
Industry sector: Retail, Clothing, Shoes, and Accessories.
Macy's is a large chain of department stores, founded in 1858 and today has more than 840 stores in 45 US states. During the year, at least 1 time, chain stores are visited by 70% of Americans.
The company analyzes large volumes of data on demand, stocks, lack of goods in specific stores, combines these data with the preferences of customers living in this territory, and thus optimizes the assortment of all categories of goods in each of the outlets.
Using SAS Institute technology, the retailer makes price adjustments for 73 million goods in almost real time, using data on demand and available inventory.
The company uses big data not only in terms of internal optimization, but also in terms of customer focus. For example, the Macys.com online store, like other online retailers, uses personalization, targeted advertising banners and e-mail newsletters, optimization of the site’s search engine so that the buyer can easily find the product he is interested in. Personalization of advertising messages and offers in the company is very high, the number of unique variations of one mailing list can reach 500,000.
Result: high sales growth rates (up to 50% per year), at least 10% of which, according to the company, is the net effect of big data .
5. Company: Ancestry.
Industry sector: Databank, Help System.
The main asset of Ancestry is a huge database of modern and historical documents that allows you to restore family ties between people and build family trees. Today, the company’s data bank already contains more than 5 billion profiles of people who lived at different times (starting from the 16th century) in a large number of countries; more than 45 million genealogical trees were constructed that set family ties between them.
Historical data is generally not presented in a machine readable format. These may be, for example, handwritten entries in books. In addition, such data may be inconsistent, inaccurate and incomplete. The data mining and machine learning algorithms, including fuzzy matching algorithms, help in the processing, addition and verification of company data.
Big data technologies are used to store and analyze data. Ancestry data is processed on three MapR clusters (a Hadoop-based distribution). The first compares with the samples in the database the results of DNA analysis (there are already more than 120 thousand samples in the database) that users can get for just $ 99 by spitting in the tube and sending a sample of saliva to the company by mail. The second implements machine learning algorithms, the third - data mining.
A preliminary analysis of the database allows you to simplify the search for relatives and form assumptions that facilitate the investigation of family history. Today, a huge number of discoveries made by users is provided by preliminary connection of profiles and ranking of results carried out by the system based on the analysis of profile data and the previous search history in the system. A few years ago, all such discoveries were the merit of the system users only. The user experience in the system is also constantly analyzed to identify stages where users are having difficulty in adding additional content to these areas or ranking search results.

Result:A huge database of historical data has been accumulated with cleared, confirmed and pre-connected records, providing easy and efficient use of the system.
6. Company: Rolls-Royce Holdings.
Industry sector: Mechanical Engineering, Engine Manufacturing.
Rolls-Royce Holdings is a British multinational company producing engines and turbines for the aerospace industry, marine vessels and energy. The engines that they produce are very large and expensive, miscalculations and errors in their production can cost millions and lead to deaths. Rolls-Royce uses big data technology to design, manufacture and further support its after-sales engines.
When developing engines, computer simulation is widely used, producing terabytes of data. Analysis and visualization of this data is performed on high-performance computing clusters. Company production systems interact with each other through the Internet of Things.
Rolls-Royce engines are equipped with hundreds of sensors that record the smallest details of their work. This data is quickly processed using machine learning algorithms and transmitted to engineers in case of deviations.
The company does not give an accurate assessment of the effectiveness of the implementation of big data technologies, however, according to the company’s management, they gave a significant cost reduction. Big data also changed the company's business model: thanks to it, the company was able to offer customers a new casing model, “Total Care”, when companies pay hourly monitoring of the state of the engines during their operation.
Result: a significant reduction in development and production costs, increased reliability, the introduction of a new business model.
We welcome your comments.
And we are waiting for you at the “Big Data Specialist” program , starting March 15th.