Big Data: Big Opportunity or Big Deception

    We often talk about technologies in 1cloud , for example, we recently wrote about machine learning and all-flash storage arrays . Today we decided to talk about Big Data. Most often, the main definition of big data is the well-known “3V” (Volume, Velocity and Variety), which was introduced by Gartner analyst Doug Laney in 2001.

    At the same time, sometimes the most important is the data volume, which is partly due to the name itself. Therefore, many think only about what size data can be considered large. In this article, we decided to find out what is really important in big data in addition to size, how they appeared, why they are criticized and in what areas they are successfully applied. / Flickr /

    Joe Hall / CC-BY

    If we talk about the size of Big Data, then, for example, David Kanter, President of Real World Technologies, believes that big data can be called if they do not fit in the server’s memory and weigh more than 3 terabytes. However, the official definition of Gartner is much more voluminous and includes not only the characteristics of volume, speed and variety of formats. Big data is also defined as information resources that require cost-effective and innovative processing methods for a deeper understanding, informed decisions and process automation.

    Therefore, Gartner analyst Svetlana Sikyular (Svetlana Sicular) callstake into account the whole definition as a whole, and not focus only on parts with three “V”. By the way, over time, the number of these “Vs” has grown, and today Veracity, Validity, Volatility and Variability (reliability, validity, volatility and variability) also belong to the characteristics of big data .

    Minute of history


    But the big data story begins much earlier. According to one of the authors of Forbes, the starting point can be considered 1944, when the American librarian Fremont Rider (Fremont Rider) published his work The Scholar and the Future of the Research Library. There, he noted that the funds of university libraries in America doubled in size every 16 years and by 2040 Yale University library will contain about 200 million books, which will require almost 10 kilometers of shelves to store.

    According to another opinionAwareness of the problem of too much data came earlier, back in 1880 in the same America, when the processing of information and the presentation of census data in a table took 8 years. Moreover, according to forecasts, processing the census data of 1890 would have taken even more time, and the results would not have been ready even before the new census. Then the problem was solved by the tabulating machine, invented by Herman Hollerith in 1881.

    The term Big Data was first introduced (according to the Association for Computing Machinery electronic library)in 1997 by Michael Cox and David Ellsworth at the 8th IEEE Visualization Conference. They called the big data problem a lack of main memory, local and remote disk for virtualization. And in 1998, the head of research at SGI, John R. Mashey, at the USENIX conference, used the term Big Data in its current form.

    And although the problem of storing a large amount of data has been recognized for a long time and intensified after the advent of the Internet, a turning point2003 was the year for which more information was created than in all the previous time. Around the same time, the Google File System published the MapReduce computational concept, which formed the basis of Hadoop. Doug Cutting has been working on this tool for several years as part of the Nutch project, and in 2006 Katting joined Yahoo and Hadoop became a separate, complete solution.

    We can say that big data made it possible to create search engines in the form in which they exist now. You can read more about this in an article by Robert X. Cringely or its translationon Habré. Then big data really turned the industry around, allowing you to quickly search for the right pages. Another important point in the history of Big Data is 2008, when Big Data was given a modern definition in the journal Nature as a set of special methods and tools for processing huge amounts of information and presenting it in a way that is understandable to the user.

    Big data or big cheat?


    There is a big problem in the modern perception and understanding of big data - in connection with the growing popularity of technology, it seems to be a panacea and a solution that any self-respecting company should implement. In addition, for many people, big data is synonymous with Hadoop, and this leads some companies to think that if you process the data with this tool, it will immediately become big.

    In fact, the choice of a tool depends not so much on the size of the data (although this may be important), but on a specific task. At the same time, the correct formulation of the problem may show that it is completely optionalto resort to the help of big data and that a simple analysis can be much more effective in time and money. Therefore, many experts "scold" the Big Data phenomenon for the attention it attracts, forcing many companies to follow trends and apply technologies that are not needed by everyone.

    Another expectation is that big data is the key to absolutely all knowledge. But the fact is that to extract information you need to be able to make the right queries. Bernard Marr, a big data expert , believes that most Big Data projects fail because companies cannot formulate an exact goal. The collection of data today does not mean anything, its storage has becomecheaper than destruction.

    Some even believe that Big Data can actually be called a big mistake or a big trick . A flurry of criticism hit big data after the sensational failure of Google Flu Trends, when the project missed the 2013 epidemic and distorted information about it by 140%. Then, scientists from Northeast, Harvard and Houston universities criticized the tool, revealing that over the past two years of work, analysis has often shown incorrect results. One of the reasons is the change in the Google search tool itself, which led to the collection of disparate data.

    Often, big data analysis revealsconnections between events that really could not affect each other in any way. The number of false correlations increases with the amount of data being analyzed, and too much data is just as bad as too little. This does not mean that big data does not work, just in addition to computer analysis, it is necessary to involve scientists and specialists in a certain narrow field of knowledge who can figure out which data and analysis results are of practical value and can be used to predict something.

    Big Data to the rescue


    Certain problems exist in almost any field: incomplete data or their lack, lack of a uniform recording standard, inaccuracy of available information. But despite this, there are already many successful projects that really work. We have already talked about some cases of using Big Data in this article .

    Today, there are several major projects whose purpose is to make the situation on the roads safer. For example, Tennessee Highway Patrol, together with IBM, has developed an emergency forecasting solution that uses data from previous accidents, arrests of drivers who are intoxicated or drugged, and event data. And in Kentucky, they introduced a Hadoop-based analytic system that uses data from traffic sensors, social media records and the Google Waze navigation application, which helps the local administration optimize snow removal costs and make more efficient use of anti-ice tools.

    Deloitte Center experts are confident that by 2020, big data will completely changefield of medicine: patients will know almost everything about their health thanks to smart devices that collect various information, and will participate in the choice of the best possible treatment, and research conducted by pharmaceutical companies will reach a completely different level. With the help of big data and machine learning, you can create a learning healthcare system that, based on electronic medical records and treatment results, can predict the response of a particular patient to radiation therapy .

    There is also a successfulexperience with big data in HR. For example, Xerox was able to reduce staff turnover by 20% thanks to Big Data. An analysis of the data showed that people without experience, with high activity in social networks and with great creative potential remain in one place of work much longer. Such cases give experts an opportunity to believe that big data can be used to create an employer’s brand, select candidates, draw up questions for interviews, identify talented abilities of employees and choose employees for promotion.

    Big data is also used in Russia, for example, Yandex launched a service for predictingweather, for which data from weather stations, radars and satellites are used. Moreover, the plans even included the use of indicators built-in smartphones barometers to improve the accuracy of forecasts. In addition, many banks and a large three mobile operators are engaged in big data . Initially, they used solutions only for internal purposes, but now, for example, Megafon cooperates with the government of Moscow and Russian Railways. You can read more about the Vimpelcom case (Beeline) on Habré.

    Many companies realized the potential of data processing. But the real transition to big data is related to how all this information can be used for the good of the business. Ruben Sigala, Head of Research at Caesars Entertainment, in an interview with McKinseysays that the main difficulty in working with big data is to choose the right tool.

    Despite the fact that awareness of the problem has come a long time ago, and the tools have existed and improved over the years, the search for the perfect solution continues today and can also be associated with the search for personnel, on which the results of big data analysis can depend to a much greater extent.

    PS What else are we writing on the 1cloud IaaS provider blog:


    Also popular now: