Google Books has determined the total number of book titles in the world


    As you know, the Google Books project is one of the most ambitious projects of our time. To create a single database of books in electronic form is a serious task, which is complicated by the need to negotiate with authors, publishers and other copyright holders. This project is interesting in many ways - social, technological and logistic. Its influence on modern society also has a place to be, although at the moment this influence is not so strong. But this is not about that. The fact is that the creators of the project tried to count every book in the world (not the total number of books, but the total number of titles of books). It is clear that with such a calculation errors are inevitable, but nevertheless one can hope for Google. So, the resulting number is huge - 129864880 titles come out.

    Unfortunately, book-counting methods used by specialists are not particularly advertised. It is only known that various catalogs were used, requests were made to university libraries, public libraries, private collections, museums and other organizations. Creating a robust algorithm to separate the “grains from the chaff” is a difficult task, but it looks like Google has done it. Of course, it was necessary to think out algorithms for sorting, classifying and analyzing the number of books - this is a complex, complex system of algorithms, which I would like to know more about.

    Generally speaking, the calculation was made not out of idle curiosity, but in order to assess the real extent of the work done within the project, plus to evaluate the efforts that will have to be made to continue and (if at all possible) to complete the project.

    When calculating the number of books, the corporation most often used a variety of ISBN catalogs as a source of information, which have existed since about the beginning of the 60s of the last century. It is interesting that during the analysis, errors were found in the names of the catalog - about one and a half thousand books received the same identifier, as Google employees have already notified the libraries in whose catalogs the error crept in.

    Interestingly, in the beginning, when calculating, Google got a figure close to a billion. However, after deleting all copies and duplicates, the number of books was reduced to 600 million. After an even more thorough analysis, the final figure reached the value of 129864880. It would be interesting to know how much information is contained in such a mass of books, in quantitative terms. In general, an interesting study by the Google development team, which ended successfully. Who is the book lover? You can already start collecting the complete collection in print :-)

    More detailed information about the project can be found in the original source .

    Also popular now: