Hadoop Big Data Subscription in SAP Cloud

    Today we will talk about one of the SAP services that characterizes our new approach to creating products and working with clients. This is an SAP Cloud Platform Big Data Services solution that offers customers the ability to work with big data in Hadoop using a cloud-based subscription model.

    In the first article, we will review how Big Data analysis can come in handy for a business in practice, how cloud and on-premise Hadoop hosting differs, and about the main functions, services and technologies in SAP Cloud Platform Big Data Services. In the following articles we will take a closer look at the technological features and individual services within this solution.

    Big Data in business

    image

    Everyone knows that among SAP customers there are many large Russian and world companies from industry, metallurgy, oil and gas and other "conservative industries", for them we develop and implement IT solutions and systems. Now these companies are increasingly investing in new technologies - in the Internet of things, machine learning or working with Big Data (in particular, trying to extract new value from this big data). For example, for metallurgical companies in the current economic and geopolitical conditions, it is critical to find new sources of profit or ways to reduce costs. One such way is to search for new ideas in big data that tells us about business, about work processes and the outside world as a whole.

    There are many solutions on the market for storing and working with big data - both free open source and commercial products. The most popular solution is Hadoop and its additional components. Among the reasons for its relevance:

    • Reliability
    • Scalability
    • Optimal cost of information storage
    • A large number of additional open source software components for data processing in Hadoop - Spark, Hive and others
    • A large number of specialists are available on the market who are able to work with Hadoop

    The popularity of free open source solutions is obvious. However, when deploying Hadoop for industrial use, as a rule, free open source versions are not used in their pure form. Commercial versions of Hadoop open source products have gained popularity in the business world. They are distributed by Cloudera, Hortonworks and other developers. In this case, the providers are responsible for the reliability of the software and the interaction of all components. There are also alternative services that provide the ability to work with big data through the clouds, by subscription.

    Businesses often face a dilemma - which approach to working with big data should be chosen, on-premise (local) or in the cloud. Of course, most internal IT departments of companies vote for the first option, due to traditional cloud concerns.

    Research company Forrester conducted a survey among companies that work with Big Data about how they use their Hadoop solutions - in the cloud or on-premise. 37% of respondents said they plan to increase investment in cloud services for Big Data from 5% to 10%. Another 14% of the study participants said they would increase the cost of cloud-based Hadoop solutions by more than 10%. Why do they opt for clouds?

    Running Hadoop on your own servers is just the first step in working with Big Data, when experimenting with data and testing hypotheses. Another story is if you need to put the solution into commercial operation, where there are certain requirements: SLA for availability of 99.9%, ensuring high reliability of storage of huge data arrays, as well as the performance of targeted KPIs for performance.

    If you choose to place Hadoop on-premise in a productive, you have to solve the following tasks:

    • Find and hire experienced IT professionals
    • Buy the necessary equipment
    • Purchase the necessary distributions, install and set up software
    • Run solution in productive
    • Maintain the solution with regular operating costs (staff salaries, equipment maintenance, etc.)

    It must be borne in mind that this preparatory phase takes considerable time. Therefore, companies make a choice between on-premise and cloud service.

    In one of the reports of the consulting firm Bain & Co. Netflix is ​​an example. In 2016, the company announced that they had to work with thousands of data nodes under huge load to process Big Data. Every day they process 350 billion user events and petabytes of data from their services. Of course, in this case it is impossible to cope only with the forces of your own servers - or you have to continuously build your data centers.

    Another example from more “traditional industries” is General Electric. In 2013, they began the transition from their own data centers to the clouds. First, the oil and gas division switched to a new service, then the transfer of more than 9,000 thousand infrastructure applications of the company began. As a result, General Electric managed to reduce the number of its own data centers from 30 to 4, and with it the costs of personnel, equipment, etc.

    SAP did not stay away from the cloud trend. In 2016, we were joined by the team of Altiscale, one of the world's leading Big Data As-a-Service service providers. Their solution became a new product of SAP Cloud Platform Big Data Services, which is available to SAP customers using the cloud subscription model, and has also been integrated into the overall SAP cloud structure.

    The developers of this solution are the former Chief Technology Officer (CTO) of Yahoo and his colleagues involved in the development of Hadoop in the company. For 7 years at Yahoo, they turned their small Hadoop project into a productive system with more than 42,000 data nodes.

    What is SAP Cloud Platform Big Data Services - cloud Hadoop service from SAP

    SAP Cloud Platform Big Data Services is a set of tools for working with big data using the SaaS (Software-as-a-Service) model.

    Consider the architecture of the SAP Cloud Platform Big Data Services.

    image

    The service includes three main parts:

    Apache Hadoop Cluster

    The cluster uses ODPi-certified Hadoop compilation. This means that applications and scripts running in ODPi environments of other services will successfully run on SAP Big Data Services.

    * For reference, ODPi (Open Data Platform initiative) is a non-profit organization that standardizes Hadoop and its components. In addition to SAP, ODPi includes such well-known vendors as Hortonworks, IBM, SAS and many others.

    The cluster includes three types of nodes managing, serving and nodes with data: namenode, secondary namenode, resource manager (YARN is included in the initial configuration of the service).

    At the same time, the duplicate name node supports additional services Oozie, Hive Metastore, etc. When connected, the client is issued a separate cluster with the necessary resources. Resources are described by the storage volume and the number of machine hours. If necessary, cluster resources can be flexibly expanded for the duration of critical calculations or on an ongoing basis.

    Workbench is a single access point to the Big Data Service.

    For security reasons, direct access to the Hadoop cluster is limited to service personnel and Workbench. The client only gets access to the Workbench, which includes the local Hadoop, as well as Hive, Spark, Oozie, Pig and other necessary components for data science and data engineering, including SAP Lumira and SAP Predictive Analytics :.
    image
    You can find more information about the composition of the service on the site .

    Using Workbench, a client can run scripts, examine data using Business Intelligence tools, and solve other tasks. In turn, Workbench works closely with the Hadoop cluster over a high-speed channel.

    Big Data Service Portal

    It is used to maintain users, generate access keys to Big Data Service, view cluster usage statistics and perform other operational tasks that a client has.

    A jumphost server is used to connect the Big Data Service to the outside world. All network interaction is carried out in the space of local ip-addresses - virtual private cloud. The standard way to access the Big Data Service is SSH. Upon customer request, other connection options are available. Big Data Service also supports kerberos authentication, which allows you to use Single Sign-On (SSO).

    Big Data Service can interact with other SAP cloud services as well as with on-premise solutions. The following options are available for integration:

    • Sensor data collection and processing with Kafka Streaming
    • Extraction of data from relational databases using Kafka Connectors or SAP Data Services
    • Interaction with SAP systems on the SAP HANA platform through Smart Data Access and Smart Data Integration
    • Interacting with on-premise Hadoop at the Hadoop Distributed File System (HDFS) level

    Communication channels connected to the Big Data Service are organized in such a way as to download data from client source systems at high speed.

    Next year in the Big Data Service roadmap - integration with SAP solutions for working with Vora big data and SAP Data Hub. We will talk about them in more detail in one of the following articles.

    The difference between SAP Cloud Platform Big Data Services and other cloud Hadoop solutions

    The main difference between the SAP solution and the others is that it can be organically integrated into business processes through integration with services and other SAP systems. This is a key factor that helps in practice monetize big data. If only data scientists see the results of data analysis when working with Hadoop, then they have yet to convince business users of the need to put new ideas into practice - and there is no guarantee that hypotheses will be applied in practice. SAP Cloud Platform Big Data Services can be directly integrated with the company's internal IT systems, as one of the steps of the business process. In more detail about the differences between the SAP solution and others, about how to integrate the results of the work of big data specialists into business processes in practice, we will describe in the next article.

    Client Use Cases SAP Cloud Platform Big Data Services

    Glu Mobile


    Glu Mobile is one of the largest global developers of mobile games, including the successful Cooking Dash, Deer Hunter, Contract Killer, Kim Kardashian: Hollywood, Frontline Commando projects. The company has development studios around the world, one of which is located in Moscow.

    Glu Mobile develops and supports free-to-play game services that are free to download and monetized through internal microtransactions. For such game services, it is important that players do not leave them for a long time.

    The daily audience of Glu Mobile projects is more than 5 million active users, the company's games have been installed more than 1.3 billion times. Given such a large-scale audience, the company faces the following tasks - to make the player comfortable and interesting to play, while increasing the profit indicator LTV (lifetime value) of one player.

    To do this, the company in real time collects huge data from its projects:

    • Over 30 thousand user actions every second
    • About 2 billion user activity reports every day
    • Over 100 million events from various metrics
    • 2 trillion user events stored on the basis of SAP Cloud Platform

    Initially, Glu Mobile tried to use the on-premise Hadoop solution, but faced the following difficulties:

    • The more data volumes became, the more difficult it was to work with them.
    • Hadoop weak internal team
    • Weak system reliability, periodic server crashes
    • Weak results when performing database queries

    As a result of the switch to SAP Cloud Platform Big Data Services, the Glu Mobile team received the following results:

    • The solution meets the needs of the data processing company
    • Ability to work with huge volumes of rapidly emerging new data
    • One of the best solutions on the market in terms of performance and reliability.
    • The internal team was freed from having to spend time on Hadoop and switched to Data Science
    • Easy scalability based on business needs

    How Glu Mobile uses SAP Cloud Platform Big Data Services:

    image


    Case Neustar MarketShare DecisionCloud

    Neustar is a company that provides customers with services for analyzing the results of marketing campaigns, as well as analyzing user actions. The company collects a wide variety of data in many industries - retail, finance, pharmaceuticals, the automotive industry, technology companies.

    Currently, the amount of data posted on the service facilities of SAP Cloud Big Data Service is about 2.5 Petabytes.

    When Neustar used the previous platform to run Big Data, they had the following problems:

    • It takes too much time to complete operations
    • Weak service reliability
    • Product Development Difficulties
    • Infrastructure Maintenance Costs
    • Ability to work only with a limited number of customers

    After switching to SAP Cloud Platform Big Data Services, the company gained the following benefits:

    • High performance and service reliability
    • Ability to focus on analytics instead of operating Hadoop
    • More efficient resource allocation and cost management
    • Increased competitiveness of solutions in the market

    image

    image

    First Data Company First Data

    is a company that processes bank card transactions. This is the largest American bank card processing service (up to 45% of the market).

    At the first stage of implementation in 2015, the SAP Cloud Platform Big Data Services solution expanded the First Data functionality for small businesses and also reduced costs by 500 thousand US dollars. At the second stage, in 2017, the solution helped implement the detection of fraud on bank cards, and also saved the company another two million US dollars ACV.

    Using the SAP solution also enabled First Data customers to obtain the following information:

    • Link transaction information and third party data
    • Get analysis of customer data and results of promotional campaigns depending on geography or demographic factors
    • Compare your results with similar businesses
    • Getting recommendations for improving sales, marketing activities and increasing customer loyalty based on big data

    Problems with the previous infrastructure for working with Big Data:

    • Impossible amount of investment needed to scale the use of a proprietary solution
    • Inability to study information in detail
    • Limited number of visualization options available
    • Weak support from the vendor

    When choosing an SAP solution, First Data was guided by the following objectives:

    • Expand product use among more customers
    • Support for analysis of more detailed data and a larger set of visualizations
    • Ability to add new features to the product and more interactivity over time

    What are the benefits of switching to SAP?

    • Significant cost reduction
    • Accomplishment of the set productive goals
    • Flexibility in analyzing detailed information
    • Ample opportunities for data visualization
    • Wide support from the vendor, including technical specialists
    • Productive platform for working with Big Data

    One of the results of the transition to the SAP solution - the execution of SQL queries on the new solution is 30 times faster than expected.

    image

    A short summary of the article about SAP Cloud Platform Big Data Services :

    • Quick start of the project
    • Availability of equipment for industrial launch in days, not months
    • A quick return on investment from using a cloud service (confirmed by the opinions of experts and analysts)
    • Reliability and SLA 99.99% in terms of service availability, which meets the requirements of industrial solutions
    • High data processing speed due to innovative architecture and specially developed software versions
    • Cloud Hadoop service successfully coexists with existing on-premise Hadoop clusters and other systems
    • The client does not need to worry about hardware, administering Hadoop, updating components - these tasks are the responsibility of the provider.
    • The SAP Big Data Service offers its customers a support service comparable to the well-known SAP Max Attention premium support service. The client can seek help from a team of professionals on various issues, including recommendations on the performance of calculations, etc.

    image

    In the next article, we will talk more about the plans for the development of the SAP Cloud Platform Big Data Service: about integration with other SAP services and solutions, about new features and applications, etc.

    If you have read this material to the end and want to independently test the practice of working with SAP Cloud Platform Big Data Services, write to us to get a free test access to the service.

    Also popular now: