The Little Secrets of Big Counts


    If you are interested in what knowledge can be extracted from a large data array, how large graphs are, and what tasks for the analysis of social graphs are offered by Facebook, Twitter, etc., then this article is for you.

    So, in total we will consider three tasks and the first of them is Positive Link Prediction from Facebook. To download data you need to register at kaggle.com .

    A social graph is given, the number of test vertices is 262588, the number of edges in the graph is 9437519, the number of vertices in the graph of 1862220 is already a reason to get scared;) This graph is obtained from the real one by removing the edges. Objective: for the users specified by the test sample, predict up to 10 other users whom they should follow.

    The competition was held under the motto: “Show them your talent, not just your resume”. Facebook will try to recruit the best participants.
    Useful links:
    1. cs.stanford.edu/people/jure
    2. www.machinedlearnings.com/2012/06/thought-on-link-prediction.html
    3. cs.stanford.edu/people/jure

    The next task is called Community Detection and, accordingly, is devoted to the problem of highlighting communities on Twitter. You can read the materials of the 19th World Wide Web conference and download the social graph from Twitter here . As it often happens, the English Wikipedia helps to familiarize yourself with the topic: en.wikipedia.org/wiki/Community_structure. But if you are more determined than ever, you will need a more solid source, for example, this one .

    For those who are interested in where the wind is blowing, the final task is Cascade Analysis. You can familiarize yourself with the models of informational confrontation in the media by reading an article by Yang and Leskovets . A full list of references can help you find answers to many questions. Experimental data: snap.stanford.edu/data/memetracker9.html and snap.stanford.edu/data/bigdata/twitter7 .
    memetracker.org/quotes-kdd09.pdf is an invaluable link for fans to model information battles.

    If you decide to do one of the proposed tasks or a similar task, then this is a great occasion to create an article or poster (depending on the goals and achieved results) and send it to the conference “Graphs theory and application” CSEDays'12 .
    Good luck and fast converging methods! :)
    Resources:
    // Student Reports
    1. www.stanford.edu/class/cs224w/proj/jbank_Finalwriteup_v1.pdf
    2. www.stanford.edu/class/cs224w/proj/jieyang_Finalwriteup_v3.pdf
    // Datasets, publications, libraries for data analysis in C ++, visualization
    3. snap.stanford.edu
    4. odysseas.calit2.uci.edu/doku.php/public : online_social_networks
    5.law.di.unimi.it/datasets.php
    6. rise4fun.com/agl
    // Jure Leskovec
    7. cs.stanford.edu/people/jure

    Also popular now: