
WIKIUp it, WIKIUp it!
Good evening, dear friends!
Recently, strolling through the vast Internet, I came across the amazing works of Chris Harrison , sitting a little in shock, I thought, “Is it difficult to visualize Wikipedia or not?” And decided to try it!

So let's get started!
The first step is to determine what we will visualize and using what means. And having studied a little what is and how my choice fell on the following means:
The developers of the Wiki API described it just fine, so you should not translate the description verbatim to understand how to call one or another method.
On a habr already wrote about Graphviz package therefore I think it is not necessary to describe it again. But I immediately read the description of the dot language and graphviz tools almost started to write my formation of .dot files.
I was helped out by a python module called PyGraphviz, which allows you to conveniently work with the graph structure, which is then written to a .dot file.
So we will visualize cross-wikipedia articles. To do this, we need to call the method by reference:
where action is the type of method, format is the output response format, in our case it is XML, prop - we request the cross-reference links
in the output we get the following answer:
Which is processed using any DOM or SAX method.
So for processing, I used SAX and inherited my class from xml.sax.handler.ContentHandler:
Next, the main calls are redefined:
The procedure for working with the request is as follows:
Using the PyGraphviz module, work is quite simple:
Article "Mathematics" with 4 levels of nesting

Article "Habrahabr" with 5 levels of nesting

Other:
Socrates
Habrahabr 3 levels
Other examples
Recently, strolling through the vast Internet, I came across the amazing works of Chris Harrison , sitting a little in shock, I thought, “Is it difficult to visualize Wikipedia or not?” And decided to try it!

So let's get started!
Devices and tools
The first step is to determine what we will visualize and using what means. And having studied a little what is and how my choice fell on the following means:
The developers of the Wiki API described it just fine, so you should not translate the description verbatim to understand how to call one or another method.
On a habr already wrote about Graphviz package therefore I think it is not necessary to describe it again. But I immediately read the description of the dot language and graphviz tools almost started to write my formation of .dot files.
I was helped out by a python module called PyGraphviz, which allows you to conveniently work with the graph structure, which is then written to a .dot file.
The basics
So we will visualize cross-wikipedia articles. To do this, we need to call the method by reference:
ru.wikipedia.org/w/api.php?action=query&format=xml&titles=ЗАГОЛОВОК_СТАТЬИ&prop=links
where action is the type of method, format is the output response format, in our case it is XML, prop - we request the cross-reference links
in the output we get the following answer:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
* This source code was highlighted with Source Code Highlighter.
Which is processed using any DOM or SAX method.
Programming
So for processing, I used SAX and inherited my class from xml.sax.handler.ContentHandler:
class LinksListHandler(xml.sax.handler.ContentHandler):
Next, the main calls are redefined:
- startElement
- endElement
The procedure for working with the request is as follows:
- def get_links(page):
- #See wiki api documentation http://en.wikipedia.org/w/api.php
- query_val = { 'action': 'query',
- 'prop': 'links',
- 'titles': page,
- 'format': 'xml'}
- url = wiki_url() + '?' + urllib.urlencode(query_val)
- request = urllib2.Request(url)
-
- verbose_message("Wiki url: " + url)
- try:
- response = urllib2.urlopen(request)
- except urllib2.HTTPError:
- print "HTTP request error!"
- sys.exit(1)
- #verbose_message("Response xml:\n"+response.read())
- lh = LinksListHandler()
- saxparser = xml.sax.make_parser()
- saxparser.setContentHandler(lh)
- saxparser.parse(response)
-
- return lh.links
* This source code was highlighted with Source Code Highlighter.
Graph construction
Using the PyGraphviz module, work is quite simple:
- def make_wiki_graph(wiki_page, depth):
- gv = AGraph()
-
- page_list = [wiki_page]
- temp_list = []
-
- verbose_message('Create graph for ' + wiki_page)
- pageLinks = get_links(wiki_page)
- gv.add_node(wiki_page)
- for i in range(depth):
- print '>>>> Get '+str(i)+' level'
- for page in page_list:
- list = get_links(page)
- node = gv.get_node(page)
- node.attr['fontsize'] = "%i" % (MIN_FONT*2*(depth - i))
- for link in list:
- verbose_message(page + "=>" + link)
- gv.add_edge(page,link)
- temp_list.append(link)
- page_list = temp_list
- temp_list = []
- return gv
* This source code was highlighted with Source Code Highlighter.
results
Article "Mathematics" with 4 levels of nesting

Article "Habrahabr" with 5 levels of nesting

Other:
Socrates
Habrahabr 3 levels
Other examples