DZ Club's first meeting - MongoDB, Clojure, MapReduce and Azure

    Yesterday I visited an interesting event and wanted to share my impressions. The event was an informal meeting with open discussions, communication and lots of practical information.

    You can find some statistics in the LiveJournal of the main organizer Dmitry Zavalishin from Digital Zone.

    Briefly on topics are MongoDB, Clojure, MapReduce, and Azure. During the initial acquaintance of all who came, it became known that the majority of people came to listen about the world without SQL in the person of MongoDB.

    The full program looked like this:
    • Ilya Obshadko, Entarena Inc. “Practical use of MongoDB in conjunction with Clojure”
    • Dmitry Martynov, Microsoft “Microsoft Azure and everything around”
    • Pavel Alyoshin, Alexander Serkov, Yandex "The history of death and the revival of statistics with the departure from Oracle to MapReduce"


    As said, Entarena Inc. is an ambitious California startup with some of the development in Russia. The prototype developed since last fall is planned to be completed in 2-3 months.

    Ilya explained the choice of MongoDB and Clojure by the convenience of developers using them, which allows them to develop faster and more efficiently. The audience had a question about performance in "combat" conditions - on millions of records, etc. The exact numbers from the tests at this stage have not yet appeared, but according to the “feelings from architecture” and the experience of other projects, the forecasts are optimistic. Ilya promised to inform the specifics after the launch of the prototype, which would be really interesting to listen to.

    There was a question - why Clojure? What else did you watch? We looked at what works on the JVM for the accessibility of all Java libraries (“that has everything!”). I remember that they compared it with Scala, which seemed too complicated.

    Dmitry Martynov from Microsoft talked about cloud storage, which can be either regular relational or non-relational NoSQL. As I understand it, the real convenience of this service in its integration with other Microsoft technologies is that there are convenient interfaces in C #, etc. But in general, the repository has a RESTful interface and you can work with it “even from curl”.

    The most memorable and liked story of Yandex from Pavel Alyoshin and Alexander Serkov about the victory over terabytes of statistics. He simply caused a flurry of questions from almost everyone. There was a clear problem: there is more data, and the capacities are “not rubber” (over 8 years, the amount of data has increased 2,000 times - from 2GB to 4TB per day (!), While the productivity of the equipment is only 10). So what to do?
    Oracle RAC no longer helped; the limit was on the horizon. We decided to use our own developed MapReduce (it was more affordable due to the presence of the developer than using an external Hadoop). The most interesting thing is that this is not just an idea, but an already implemented and tested system that "really works." The maximum that can be “lost” due to a failure is the last few minutes of statistics.

    In general, the developers breathed a sigh of relief and now Yandex feels “dry and comfortable.” In addition, the built system scales linearly and the guys are not afraid of even petabytes.

    In addition to the stories, there were tea, coffee, and buns. In general, everything is as it should for a pleasant conversation.

    Summing up: interesting, pleasant, useful. Thanks to Dmitry for organizing!
    Next time - two weeks on Thursday.

    Also popular now: