Series: Big Data - like a dream. 4th series. Brain revolution

    In previous series: Big Data is not just a lot of data. Big Data is a positive feedback process. The Obama Button as the embodiment of rtBD & A.

    In the world there are many of the greatest books that have survived centuries and even millennia. The knowledge embodied in these books is universal. Chinese military strategies, the Bible, and the Indian Mahabharata include, among other things, patterns and canons that can be applied to the relations of people in the I, XI, and XXI centuries from the XXXI centuries. But the industrial revolution of the XIX-XXI centuries (steam locomotives-space-computers-Internet) needed its own philosophy.


    For over 100 years, we have been using the laws of Dialectical materialism (the brilliant trinity of Marx-Engels-Lenin not only discussed the overthrow of the monarchies, but were also the greatest thinkers of the late II millennium).The laws of denial of denial, struggle and unity of contradictions, the transition of quantity into quality, the cyclical development - this is all about Big Data too.

    At the turn of the millennium (pathetic, right? Can it be easier - at the end of the 90s) search engines were simple - one server. If there was absolutely nowhere to put money, then two servers. Once every six months or a year, with the growth of the Internet (it was customary to write with a capital letter), the search engine was transferred to a newer one (with 64 MB of memory and 128 GB of disks per 128/256). Apport and Yandex were located in several units on Krasnokazarmennaya and Smolenka, and the coolest world search engine Altavista was a real monster - 2 servers from DEC, the products of which, in fact, were advertised by the search engine.

    A few years later, a technological crisis came: the amount of data no longer fit on 1-2 servers - the “Law of the transition of quantity into quality” and (very, very simplified to primitivism) voila! - the old paradigm “you need a new cool server (preferably from DEC or Sun)” has been replaced by Google with the idea of ​​“a lot of cheap hardware” .

    This paradigm exists and lives well, there is more data, subsystems become systems, but more and more data! The law of transition of quantity into quality, having feasted on “iron” (hard), has grown new fangs and stuck its teeth into “soft” (soft). Fashionable OSs and languages ​​appeared, new Google OSes or FreeBSD rewritten by Yandex didn’t help solve new Big Data processing tasks anymore - the next revolutionary situation was born of Hadoop ’s “baby elephant” : a lot of cheap hardware was supplemented by “brains” distributed over all the hardware .

    Technocrat's dream is maximum decentralization! More data - there will simply be more “nodes” in the grid. More data? - Add more iron with "brains". Change task for other data? Just pour new "thoughts" into the iron brains. Since each node of the lattice solves the simplest tasks, it is quick and easy to make new “thoughts” from standard neuron elements.

    I am sure that you have already continued the chain of dialectical laws of the universe. But in the series we have to focus on all readers, and not just Sherlock Holmes, so let's fix it: matter is the unity of space and time, even such a term is space-time. And so that humanity would not be bored, that is, the limitation is the speed of light. The more data in the Hadoop lattice, the more sharpen the teeth. The laws of diamat.

    The most humorous dialectical law is the Law of the negation of negation . Only the next young scientific shoot defeated the old old adversaries and grew beards, how grandchildren come and smash their fathers - moreover, this is where humor lies under the slogans of grandfathers!

    The Hadoop-decentralizer does not cope with the time-dimension of space-time matter for rtBD & A (real-time Big Data & Analytics) tasks in which such an entity (“vileness”) of data appears as temporal value: recent data is much more important than previous ones.

    Following the cyclical development, a centralized solution appeared - IMC (In-Memory Computing) technology: one expensive computer, in which, in fact, there is only fast memory - formally disk drives (the slowest nodes in the data stream chain) are present, but on 30 roles. All the latest (most important) data is present in fast memory, analytical brains work with data “at the speed of light”.

    As an example of the real usefulness of IMC development based on SAP HANA on a popular topic of recent years - intelligent electric power systems.The main task is to optimize generation and consumption, and, as a result, reduce energy costs. As well as operational monitoring and forecasting. Each house is equipped with a smart meter. Measurements are taken every few minutes and processed by the Big Data Analytical System, integrated with the GIS. In the system, you can see the general picture of energy consumption and get detailed information for each district and house: how energy consumption changes depending on weather conditions, time of year and day. And based on these real and accurate data, you can plan the power supply of one of the most vibrant and energy-intensive areas.

    We need a calculator with a lot of zeros to calculate the benefits in such large-scale projects the size of Manhattan or Brasilia. But the current cost of IMC solutions (hundreds of thousands of $) cuts off 99% of those who want it, which means it is not a mass solution and the search continues.

    Where do we go next? Is there a “mix” of Hadoop-IMC waiting for us, or dynamic “hybrid clouds” with typesetting “nodes”, or a switch to molecular-chemical computers (it was not for nothing that nature chose this approach)? Life will show.

    Here's how the development process of the rtBD Platform went in our case:
    1. The first 3-4 months (spring-summer 2012) - the cloud , the optimal core-memory sets were selected. The cost of placing data in the cloud at that time was very high (first TB), and finance - as always, that is, not enough.
    2. Next year (2013) - one-time purchase of servers of different sizes (HP) for the main subsystems based on the results of cloud experiments. Shrunk on disks, took a little fast, but the main arrays are slow SATA (10 TB).
    3. In 2014, accelerated and scaled up - the purchase of cheap (compared to HP) servers with fast disks . Together with our partners, we tested in parallel with the main branch and the branch on SAP HANA - the gain was up to 5 times in speed, but our SaaS or cheaper clouds than HANA were enough for our customers.
    4. 2014-15 - a hybrid distributed scheme , including the client "one system - one server" in a distributed network of data streams.
    5. Negation-negation (to claim 1): now dozens of TB of archived data are stored in super-cheap clouds :-)

    In the following series we will talk about more pressing things for today, but to continue: NoSQL or column DBMS, where it floats The "blue giant", where the legs of hearing grow, that "the data ends."

    Big Data is like a dream. Series
    1 Series 2: Big Data Negative or Positive?
    Series 3: “Obama Button”

    Also popular now: