Strata + Hadoop World NYC 2015 - how it was



    Machine learning, cloud computing, visualization, Hadoop, Spark, scalability, analytics, terabytes, petabytes, faster, bigger, more reliable, better - all these words are spinning around in my head after three days in the exhibition hall of the Strata + Hadoop conference. And, of course, everywhere there are mountains of toy elephants - the main symbol of the conference.

    My colleagues from DataArt and DeviceHive not only attended the conference, but also helped friends from Canonical. At their booth, they demonstrated Juju, a powerful tool that helps you configure and deploy services in the cloud quickly and without problems. There we brought our favorite demo - a device for monitoring industrial equipment. No tediousness and PowerPoint, all live - SensorTag accelerometer installed on the fan to track its vibration.



    To simulate vibration, we glued a piece of electrical tape on one of the fan blades. This upset the balance and made the whole structure very unstable. Data from the sensors was transmitted to the DeviceHive server as a time series, processed in Spark Streaming and displayed on beautiful graphs. All of this is deployed with Juju, which integrates perfectly with Amazon Web Services (AWS).

    Despite the abundance of companies with cool products, the main topic of the conference was, it seems to me, Spark. Spark discussed, Spark taught, Spark launched, Spark integrated. Spark was here, Spark was there, Spark was everywhere. Almost everyone, regardless of the size of the companies, shared their experience of integrating and using Spark in their products.

    In just a few years, Spark has proven itself to be an excellent platform for data processing, machine learning, and distributed computing. His environment is constantly expanding, he is changing the way he works with data and makes development faster.

    The next generation of analytics tools will probably work one way or another with Spark, which will allow companies to use the data more efficiently. And the next generation of parallel computing tools will help businesses, engineers, and data processors join forces in development.

    Spark-developing company Databricks introduced its new data analysis product - an interactive shell for creating Spark jobs, launching them on an AWS cluster, creating queries and visualizing data. Add to this Spark Streaming and you can run models while working with real-time data streams. While the Databricks hosts the front page with a user interface, the data and infrastructure to run Spark is hosted on your AWS machines. It will be interesting to compare all this with the Space Needle, which Amazon promise to present at re: Invent 2015 in Las Vegas.

    Obviously, working with large amounts of data requires more than just choosing a specific database or distributed system. Entire platforms for developing BigData technologies appear, and the world begins to think in terms of these platforms: sets of technologies and architectural design patterns that are jointly developed to solve various BigData problems. Data platforms largely determine how we access, store, transmit, process and search for structured, unstructured, and sensory data. A great example of such a platform is the Basho Data Platform, where Basho uses its Riak database and makes it part of something more than just a key-value store.

    Key points of self-education:
    • Experiment with public data in Spark.
    • Continue to learn and use Scala.
    • Functional programming.
    • Functional programming.
    • Functional programming.

    Also popular now: