Toby Segaran “Programming a Collective Mind”

    You know, I love books about all sorts of interesting algorithms, and recently I got another such book.

    The book "Programming the collective mind" is mainly devoted to classification and clustering algorithms, although there are chapters on other topics such as creating your own search engine, genetic algorithms, and genetic programming. Almost all of the described algorithms are applied in the spirit of Web 2.0, using analysis of user behavior on different sites that provide their API. But what was especially pleasantly surprising was that all the examples are written in Python.


    Here are the algorithms described in the book:


    • Collaborative filtering. Or, in human terms, algorithms that may recommend you some kind of purchases, sites or music depending on the ratings you put to other similar things. For such algorithms, the imposition of purchases in online stores or the selection of music at last.fm works . At the end of the chapter is an example that will recommend you links from the del.icio.us service .
    • Algorithms of grouping (clustering). The created example analyzes the RSS feeds of blogs and tries to automatically divide them into groups in the form of a tree depending on the frequency of words that come across the blog. At the same time, Segaran talks about how you can make the names of blogs arranged on a plane in heaps, depending on their proximity in terms of the topics being considered.
    • A separate chapter is devoted to building search engines - creating a spider and, most importantly, considers link ranking algorithms, including taking into account page links to each other, thus creating an analogue of Google PageRank. It is also interesting that in the same chapter there is an example where a neural network is used to issue the most relevant links, which learns as the user clicks on the links he likes.


    • Another chapter is devoted to optimization methods. Here, in principle, there is nothing special, the algorithms for random search, descent from the mountain, annealing, and the genetic algorithm are briefly described. Searching flights for a rather specific task using the Kayak site API serves as a webdwanol example . But in this chapter, I was more interested in the algorithms for moving students to a hostel to satisfy as many people as possible, and drawing connections between people in such a way that the lines indicating the connections did not intersect.
    • Sorting algorithms are considered using spam filtering as an example. The Bayesian algorithm is presented, which usually comes to mind first when solving a similar problem, and its modification is the Fisher method. An example is the work with the Akismet website .
    • It tells about the algorithms for constructing decision trees, and at the end of the chapter an example is provided that downloads real estate price data from the Zillow website , and then builds a decision tree that shows what parameters of houses affect how the price is, and if you want, you can predict what price will be at home with the given parameters.
    • One chapter is devoted to numerical forecasts. Here we consider the algorithm of k nearest neighbors and the weighted algorithm of k nearest neighbors. The example at the end of the chapter tries to guess the most probable price of goods at the eBay auction according to the specified parameters.
    • A separate chapter is devoted to such classification algorithms as the support vector machine and nuclear methods. Unfortunately, the most interesting thing is that the support vector method is almost not considered. More precisely, it’s literally told on the fingers what he does, but not how he does it, but the example uses the library http://www.csie.ntu.edu.tw/~cjlin/libsvm/ , where this algorithm is already programmed. At the end of the chapter is an example of matching pairs using the Facebook site API .
    • A separate chapter is devoted to identifying independent attributes by data arrays, but here, to be honest, I did not really understand what the described algorithms do.
    • And the last chapter is devoted to genetic programming - a variety of genetic algorithms when classes competing for a place under the sun (in RAM) compete against each other for classes simulating different programs that cross and mutate with each other.

    As a result, I liked the book, not only do I am interested in such topics on their own, but also the examples are written in your favorite language. Of the pleasant things, it can be noted that the author tried and for many examples picked up ready-made data for analysis, which can be downloaded from the author’s site . By the way, he recently published two more books about data processing , but but whether we will see them in Russian is a big question.


    Of the shortcomings, it can be noted that I would still like to read about some of the described algorithms in more detail, but as a starting point, the book is quite suitable.



    This blog post is mine.


    Also popular now: