Digitizing World Book Heritage Using Smartphones
There are more than two and a half billion smartphone users on the planet. If each of them digitized at least one book, we would need only one day for all the works ever written. Kalev Litaru, a specialist in data processing systems with twenty years of experience, offers a new way to digitize books based on crowdsourcing and conventional smartphones.
In January 2015, a fire damaged 15% of the funds
scientific library of INION in Moscow. Then 2000 square meters burned out and part of the roof collapsed. The library housed 14 million books and documents, including rare editions of the 16th - early 20th centuries. According to the director of the library, Yuri Pivovarov, almost no money was allocated for digitization. The problem of the complete digitization of books, documents, manuscripts has not been solved globally, although there are projects that deal with this, and the libraries themselves in Russia and other countries are trying to convert existing copies to digital form.
Bulky scanners costing from ten thousand dollars are used to digitize books. For these scanners are professional operators whose time costs money. Operators turn the pages of a book, and the scanner photographs two pages at the same time. The speed of work is usually up to five hundred pages per hour, that is, in an hour the employee will scan one or two books.
Kalev Litaru offers to call for help enthusiasts from around the world with their smartphones. As an example of the effectiveness of crowdsourcing, he talks about the eBird project , which tracks bird migrations. Over the course of thirteen years, over one hundred thousand volunteers have been working on this project, which made it possible to record 275 million observations from 2.87 million unique locations. Now on people on the planet2.6 billion smartphones , by 2020 their number will increase to 6.1 billion, including due to developing countries.
Employees of the Russian company Elar work on digitizing books
Litaru proposes to divide the project into two parts. At the first stage, you need to make a list of books to be digitized. To do this, use the WorldCat catalog and other tools: the initial list will include all the books that are in the libraries and which have not been digitized. Fragile copies and those books that are protected by copyright will be excluded from this list. The list may be partially handed over to crowdsourcing - libraries will publish a list of books whose status with respect to copyright protection is unknown, and volunteers will check the first pages of books and send this information to libraries.
Only the first stage will allow you to understand what is the percentage of digitized books in the world. According to Google’s own data, the company has digitized 6% of all published books as part of the Ngram project.but their exact list is unknown.
After compiling a complete list of non-digital works, the main crowdsourcing part of the project begins. Volunteers will come to the library, take a book, get a smartphone and photograph the cover. Optical character recognition like Google’s, will determine the author and title of the book and compare it with the list on the server, after which it will inform you of the need to digitize the copy or that this work has already been done. If the book is to be digitized, the volunteer will take a photograph of the first few pages: at this stage, the system should determine how high-quality photographs are obtained, whether everything is okay with light, whether characters can be recognized, or if the photographer’s hands are shaking too much. Then the user receives a command to continue working or take another book.
Litaru conducted several tests and found out that in this way one user, having adapted himself, will be able to digitize a 600-page book in five to ten minutes. In 2004, for his diploma work, he himself manually digitized thirty thousand pages of materials from more than seven hundred documents using an ordinary digital camera and a cheap desk lamp. Litaru completed most of this work within fifteen hours on one weekend.
Images from smartphones will not be of the same quality that is achieved using professional technology. But they will be enough to read , and the optical character recognition system will make the text searchable. These pagesThey were photographed ten years ago, and today smartphones have better cameras and LED flash.
Libraries can allow volunteers to use document scanners available at the institution for this work. All results will be sent to the central server of the project, where they will be translated into PDF and other formats for reading electronic books, and where the text will be processed and available for search.
Gamification factor can be included in the digitization process. Volunteers will receive points for digitized works, and organizations will be able to arrange “digitization days” and give gifts to the best participants in the project. Even schoolchildren can join the work. Libraries will receive feedback from users about poorly digitized pages. Volunteers will become something like Wikipedia editors, and libraries will coordinate their work.
Wikipedia and other crowdsourcing projectshave shown their effectiveness. And crowdsourcing can once again show its effectiveness in digitizing book heritage, Kalev Litaru is sure. Instead of taking selfies and food photos on Instagram, users of two and a half billion smartphones can help save a lot of works and create a huge database of all books ever published to leave it to our descendants.
scientific library of INION in Moscow. Then 2000 square meters burned out and part of the roof collapsed. The library housed 14 million books and documents, including rare editions of the 16th - early 20th centuries. According to the director of the library, Yuri Pivovarov, almost no money was allocated for digitization. The problem of the complete digitization of books, documents, manuscripts has not been solved globally, although there are projects that deal with this, and the libraries themselves in Russia and other countries are trying to convert existing copies to digital form.
Bulky scanners costing from ten thousand dollars are used to digitize books. For these scanners are professional operators whose time costs money. Operators turn the pages of a book, and the scanner photographs two pages at the same time. The speed of work is usually up to five hundred pages per hour, that is, in an hour the employee will scan one or two books.
Kalev Litaru offers to call for help enthusiasts from around the world with their smartphones. As an example of the effectiveness of crowdsourcing, he talks about the eBird project , which tracks bird migrations. Over the course of thirteen years, over one hundred thousand volunteers have been working on this project, which made it possible to record 275 million observations from 2.87 million unique locations. Now on people on the planet2.6 billion smartphones , by 2020 their number will increase to 6.1 billion, including due to developing countries.
Employees of the Russian company Elar work on digitizing books
Litaru proposes to divide the project into two parts. At the first stage, you need to make a list of books to be digitized. To do this, use the WorldCat catalog and other tools: the initial list will include all the books that are in the libraries and which have not been digitized. Fragile copies and those books that are protected by copyright will be excluded from this list. The list may be partially handed over to crowdsourcing - libraries will publish a list of books whose status with respect to copyright protection is unknown, and volunteers will check the first pages of books and send this information to libraries.
Only the first stage will allow you to understand what is the percentage of digitized books in the world. According to Google’s own data, the company has digitized 6% of all published books as part of the Ngram project.but their exact list is unknown.
After compiling a complete list of non-digital works, the main crowdsourcing part of the project begins. Volunteers will come to the library, take a book, get a smartphone and photograph the cover. Optical character recognition like Google’s, will determine the author and title of the book and compare it with the list on the server, after which it will inform you of the need to digitize the copy or that this work has already been done. If the book is to be digitized, the volunteer will take a photograph of the first few pages: at this stage, the system should determine how high-quality photographs are obtained, whether everything is okay with light, whether characters can be recognized, or if the photographer’s hands are shaking too much. Then the user receives a command to continue working or take another book.
Litaru conducted several tests and found out that in this way one user, having adapted himself, will be able to digitize a 600-page book in five to ten minutes. In 2004, for his diploma work, he himself manually digitized thirty thousand pages of materials from more than seven hundred documents using an ordinary digital camera and a cheap desk lamp. Litaru completed most of this work within fifteen hours on one weekend.
Images from smartphones will not be of the same quality that is achieved using professional technology. But they will be enough to read , and the optical character recognition system will make the text searchable. These pagesThey were photographed ten years ago, and today smartphones have better cameras and LED flash.
Libraries can allow volunteers to use document scanners available at the institution for this work. All results will be sent to the central server of the project, where they will be translated into PDF and other formats for reading electronic books, and where the text will be processed and available for search.
Gamification factor can be included in the digitization process. Volunteers will receive points for digitized works, and organizations will be able to arrange “digitization days” and give gifts to the best participants in the project. Even schoolchildren can join the work. Libraries will receive feedback from users about poorly digitized pages. Volunteers will become something like Wikipedia editors, and libraries will coordinate their work.
Wikipedia and other crowdsourcing projectshave shown their effectiveness. And crowdsourcing can once again show its effectiveness in digitizing book heritage, Kalev Litaru is sure. Instead of taking selfies and food photos on Instagram, users of two and a half billion smartphones can help save a lot of works and create a huge database of all books ever published to leave it to our descendants.