WIKIUp it, WIKIUp it!

    Good evening, dear friends!

    Recently, strolling through the vast Internet, I came across the amazing works of Chris Harrison , sitting a little in shock, I thought, “Is it difficult to visualize Wikipedia or not?” And decided to try it!

    image

    So let's get started!


    Devices and tools


    The first step is to determine what we will visualize and using what means. And having studied a little what is and how my choice fell on the following means:


    The developers of the Wiki API described it just fine, so you should not translate the description verbatim to understand how to call one or another method.

    On a habr already wrote about Graphviz package therefore I think it is not necessary to describe it again. But I immediately read the description of the dot language and graphviz tools almost started to write my formation of .dot files.

    I was helped out by a python module called PyGraphviz, which allows you to conveniently work with the graph structure, which is then written to a .dot file.

    The basics


    So we will visualize cross-wikipedia articles. To do this, we need to call the method by reference:
    ru.wikipedia.org/w/api.php?action=query&format=xml&titles=ЗАГОЛОВОК_СТАТЬИ&prop=links
    where action is the type of method, format is the output response format, in our case it is XML, prop - we request the cross-reference links

    in the output we get the following answer:
    1.  
    2.   
    3.    
    4.   
    5.   
    6.    
    7.    
    8.     
    9.     
    10.     
    11.     
    12.     
    13.     
    14.     
    15.     
    16.     
    17.     
    18.    
    19.   
    20.  
    21.  
    22.  
    23.   
    24.  
    * This source code was highlighted with Source Code Highlighter.


    Which is processed using any DOM or SAX method.

    Programming


    So for processing, I used SAX and inherited my class from xml.sax.handler.ContentHandler:
    class LinksListHandler(xml.sax.handler.ContentHandler):
    Next, the main calls are redefined:
    • startElement
    • endElement


    The procedure for working with the request is as follows:
    1. def get_links(page):
    2.   #See wiki api documentation http://en.wikipedia.org/w/api.php
    3.   query_val = { 'action': 'query',
    4.          'prop': 'links',
    5.          'titles': page,
    6.          'format': 'xml'}
    7.   url = wiki_url() + '?' + urllib.urlencode(query_val)
    8.   request = urllib2.Request(url)
    9.  
    10.   verbose_message("Wiki url: " + url)
    11.   try:
    12.     response = urllib2.urlopen(request)
    13.   except urllib2.HTTPError:
    14.     print "HTTP request error!"
    15.     sys.exit(1)
    16.   #verbose_message("Response xml:\n"+response.read())
    17.   lh = LinksListHandler()
    18.   saxparser = xml.sax.make_parser()
    19.   saxparser.setContentHandler(lh)
    20.   saxparser.parse(response)
    21.   
    22.   return lh.links
    * This source code was highlighted with Source Code Highlighter.


    Graph construction


    Using the PyGraphviz module, work is quite simple:
    1. def make_wiki_graph(wiki_page, depth):
    2.   gv = AGraph()
    3.   
    4.   page_list = [wiki_page]
    5.   temp_list = []
    6.  
    7.   verbose_message('Create graph for ' + wiki_page)
    8.   pageLinks = get_links(wiki_page)
    9.   gv.add_node(wiki_page)
    10.   for i in range(depth):
    11.     print '>>>> Get '+str(i)+' level'
    12.     for page in page_list:
    13.       list = get_links(page)
    14.       node = gv.get_node(page)
    15.       node.attr['fontsize'] = "%i" % (MIN_FONT*2*(depth - i))
    16.       for link in list:
    17.         verbose_message(page + "=>" + link)
    18.         gv.add_edge(page,link)
    19.         temp_list.append(link)
    20.     page_list = temp_list
    21.     temp_list = []
    22.   return gv
    * This source code was highlighted with Source Code Highlighter.

    results


    Article "Mathematics" with 4 levels of nesting
    image
    Article "Habrahabr" with 5 levels of nesting
    image

    Other:
    Socrates
    Habrahabr 3 levels

    Other examples

    Script itself


    Also popular now: