How we train future big data professionals

    This Saturday, our Big Data Specialist program kicks off : it turned out to be so intense that it seems that the audience will not have any free time in the next three months. In this post I will tell you exactly how we will grow Big Data specialists, and how the training process will be built during the first month.



    The central case for this period is the creation of a DMP system. DMP (data management platform) analyzes the Internet logs of users and, based on their behavior on the network, assigns various properties to these people or classifies them as specific classes. For example, a properly tuned DMP system can determine the gender and age of a person and find out if he is a gadget man or, say, a fan of luxury fashion brands. We are developing this case together with the Data-Centric Alliance; they use big data to set up advertising campaigns.

    Team work

    At our Open Day, we said that students will work on cases in teams. Some were unhappy: “Why do we need this! What if I get on the same team with a wimp and he pulls me back !? ” Others were worried: “I don’t have enough knowledge, what will I do in a team with strong developers?”. Is it logical? Yes, but in real life big data analysis is not an individual but a team sport, it is not easy for loners here. But we set ourselves the goal of immersing listeners in conditions as close as possible to combat conditions.

    Teams change once a month, and all students will receive not only valuable team work experience on the project, but also good connections with classmates. The founders of IT companies, technical directors and experienced developers came to study with us - so these connections will be no less useful than meeting famous professors.

    Data Management Platform in four steps

    During the first 4 weeks, students will create their DMP system, and each week the group will perform one sub-task, moving the entire case forward.

    So here is the plan:

    • First week: each listener independently deploys a small Hadoop cluster
    • Second week: you need to pre-process 1TB of web logs and load them correctly into the HBase table
    • Third week: We begin to analyze the logs - so far without machine learning. We take the rules determined in advance and select user classes.
    • Fourth week: Machine learning over Map-Reduce. Building a DMP system for analyzing web logs!


    Points, rating, badges

    We have created a system of points and ratings, which will help teachers and employers to navigate how hard the students are working.

    Each subtask will have at least two levels of difficulty: the first is just to solve it, the second is to solve with additional conditions. For such achievements, we will give special badges and extra points. Note to those who set a goal to find a new job: such “achievements” are of great interest to our partner employers.



    The final DMP system will have to calculate the portrait of the user with a certain accuracy factor. This is a real challenge interesting to business, and the guys from the Data-Centric Alliance are ready to offset a quarter of the cost of training a team that will create a well-working solution. A team that can show an outstanding solution will receive compensation up to 50%. Improving classification accuracy by 5% increases the return on advertising by 30%, said Alexander Petrov, head of the R&D Data-Centric Alliance and part-time leader of the first month of the course.

    Classroom and practical days

    Classes are held three times a week - on Tuesdays, Thursdays and Saturdays. Every Tuesday and Thursday - classroom activities. They are designed to maximize the involvement of students in the work (as opposed to an old-fashioned university lecture), everything is based on solving problems and analyzing cases. The purpose of the classroom is to explain the methodology for solving problems, show how all the necessary tools work, and give a critical look at the theory and basic concepts. In our experience, this format provides a deep understanding and good working skills.

    This format, by the way, allows you to actively engage and online audience! We decided that in this course we will require online participants to work with the camera turned on. Of course, listeners are shy and even indignant, but the included camera leaves no chance to work through the sleeves.

    Every Tuesday, the team will receive a task for the solution of which a week is given. On Saturdays, consultation days. The doors of Digital October will be open from 11 am, teams can come to work together on a solution to the problem. At 4:00 p.m. office hours of seminar tutors begin, which will be able to advise teams, answer specific questions and help if someone is at a standstill. The tasks themselves can be checked automatically by downloading the code to a special platform.

    Tasks, tests and colloquiums

    We will solve all problems on cloud resources, which for each team will be deployed in AWS (we became partners of Amazon Web Services and therefore can give students great computing power). To solve each of the four subtasks, 1 week is given, this is a “soft deadline”. If the team does not pass the task within this period, then one more week it will be possible to surrender, but already with a fine of 30%. Anyone who misses a few deadlines loses the opportunity to receive a certificate of completion.

    To minimize these unpleasant moments, a colloquium on the materials passed will be held every two weeks. Here you can catch up with the group and get extra points.

    As you can see, we do not have a concentration camp, but everything is pretty tough. We want to graduate specialists in whose competencies we are confident, and whose achievements we can be proud of in the future. You can still sign up for the program , one and a half places left.


    Also popular now: