
Discover the history of the Bolshoi Theater. Part one

Have you ever collected theatrical programs? If so, then there are probably dozens of them in your collection, or maybe a hundred. Now imagine that at your disposal 120 thousand programs, 48 thousand posters and 100 thousand historical photographs. The Bolshoi Theater has preserved so many paper documents since the mid-19th century. The oldest and most valuable of them have already turned yellow and become dilapidated, and it took hours to search for information in the theater archive. To preserve these treasures, the theater museum staff began to manually translate documents into electronic form, but it turned out that it could take years.
Therefore, in September 2016, together with the Bolshoi Theater and with the active support of Thekla Tolstoy, the great-great-granddaughter of Leo Tolstoy, we launched a crowdsourcing project to digitize the history of the country's main theater. In this post we will talk about the details of the first stage of the project and its technical details: how we digitized unique documents using ABBYY FineReader and how volunteers helped to verify recognition results.
A bit of history
The Bolshoi Theater was founded by Empress Catherine II on March 28, 1776. In the buildings where the theater was located, fires occurred more than once, the largest in 1853. The fire blazed for three days, and a significant part of the Bolshoi’s historical heritage burned out in it. The oldest theater document that has survived since then is the poster of 1830. All other posters and programs have been preserved only since 1858.
Poster of the Bolshoi Theater, 1830. Click on the picture to view more details.

The Bolshoi Theater Museum wanted not only to preserve the most valuable archive by digitizing it, but also to make information about the performances, characters, directors, choreographers and many others accessible to everyone. If the staff of the Bolshoi Theater manually reprinted data from programs and posters, it would take several decades. Then the theater decided to call for the help of intelligent technologies and volunteers. The initiator of the volunteer project " Discover the story of the Big " was Thekla Tolstaya. We already collaborated with her in the project “All Tolstoy in one click”. Then, in 2014, using ABBYY Recognition Server and ABBYY FineReaderand with the participation of 3 thousand volunteers, we digitized 46 thousand pages of the 90-volume collected works of Leo Tolstoy. Now all books in electronic form are available on the official portal tolstoy.ru . You can read more about the project here .
In the project “Discover the story of the Big”, we are faced with the task of not only digitizing a collection of documents, but also extracting valuable information from them to create an electronic archive.
Therefore, the project is divided into three stages:

- First, we scanned documents and recognized them with ABBYY FineReader. Then the volunteers helped to check the recognition results in order to eliminate errors that are possible during digitization.
- The second stage started in June 2017 and is still ongoing. Its task is to extract and organize data from an already digitized collection of programs and posters. ABBYY Compreno, an intelligent technology for understanding and analyzing natural language texts, analyzes the text of documents and then extracts valuable information into the database fields, by which data can later be quickly found in the electronic archive. Now volunteers are checking the results of the work of artificial intelligence. As a result, the information will be uploaded to the museum database developed by KAMIS , and the legacy of the Bolshoi Theater will be available to everyone.
- At the third stage of the project, volunteers will be involved in the rubrication of 100 thousand historical photographs from the archive of the Bolshoi Theater Museum.
The project was officially launched in October 2016, when volunteers began to check digitized texts of programs and posters. But we started preparations for the launch a little earlier.
Hot August 2016
In August 2016 , the scanning team arrived at the Bolshoi Theater. For 7 months they scanned tens of thousands of programs and photographs from the collections of the Bolshoi Theater Museum. Posters did not have to be scanned, since the museum had already done this on its own.
Friendly team of scanners. From left to right: Nikolai Altunin, Irina Andryukhina, Dmitry Nesterov.

Our partner, Fujitsu , provided two Fujitsu fi-6770 and Fujitsu fi-6750S flatbed scanners and two Fujitsu ScanSnap SV600 contactless scanners for the project .
The tablets helped us digitize the programs that were collected in the binder and tightly stitched. On them we also scanned photos from two sides. The back of the pictures contains valuable information: the name of the productions, the names of artists and photographers.

Contactless ScanSnap SV600 helped us digitize large and shabby programs. They had to be handled very carefully.

You can see in more detail how the digitalization stage went, in the gallery .
As a result of the scan, we received files with photos in TIFF format with a resolution of 600 dpi, as well as programs in JPEG format with a resolution of 300 dpi.
Recognize and Retrieve
We divided all scanned documents into small parts - “packages”, so that the work was not difficult for participants. One package is one program or poster. In programs, there is one sheet and 30 sheets each, on average four sheets each. We divided packages by years and numbered.
Then it was necessary to recognize the scanned documents and create PDF files with a text layer. Why do I need a text layer? So that museum staff can not only view digitized posters and programs, but also search and copy information. FineReader automatically recognized scans and marked areas on them: the text was highlighted in green, the images in red, and the tables in purple.

Lists of actors are presented in a tabular format in posters and programs:

Each participant of the first stage of the project was registered on the site openbolshoi.ru . Then he went into his personal account, read the detailed instructions, installed the free version of FineReader, downloaded the package (FineReader document archived in zip format) and proceeded to check. Volunteers looked at the correctness of the markup of areas, read the text and corrected the recognition inaccuracies that could occur during digitization.
Participants of this stage we called verifiers. They checked the programs and posters from October 2016 to June 2017, starting from the present and gradually moving towards the XIX century.
Briefly about how the project website was made, you can read it under the spoiler.
Crowdsourcing platform
Crowdsourcing platform
openbolshoi.ru is a platform for volunteers to work together. It was created under the control of CMS "1C-Bitrix" in conjunction with the DBMS - MySQL. The programming language is PHP. To create a repository of programs and posters, Amazon S3 was used, for version control - the GIT system. After the preparation of the ToR, the project was technically implemented in just one month.
Component platforms:
1. Public part (available to all users, contains information about the project).
2. Member’s personal account (available to registered users and is intended to check packages and personal information). In the personal account, the volunteers saw the number of packages that they received, their place in the rating and the points awarded.

3. Personal account of the administrator (available only to platform administrators, designed to verify the work of volunteers).
4. The administrative part of the platform (available only to CMS administrators and needed for global platform management).
5. Amazon file storage (designed to store packages).
openbolshoi.ru is a platform for volunteers to work together. It was created under the control of CMS "1C-Bitrix" in conjunction with the DBMS - MySQL. The programming language is PHP. To create a repository of programs and posters, Amazon S3 was used, for version control - the GIT system. After the preparation of the ToR, the project was technically implemented in just one month.
Component platforms:
1. Public part (available to all users, contains information about the project).
2. Member’s personal account (available to registered users and is intended to check packages and personal information). In the personal account, the volunteers saw the number of packages that they received, their place in the rating and the points awarded.

3. Personal account of the administrator (available only to platform administrators, designed to verify the work of volunteers).
4. The administrative part of the platform (available only to CMS administrators and needed for global platform management).
5. Amazon file storage (designed to store packages).
Get on in 48 hours
Each volunteer was given 48 hours to test one package. If during this time a person did not have time to check the document, then the file again fell into the general distribution. And already another volunteer could take him for a check. If the participant checked the package carefully and on time, then the package was accepted and awarded to the volunteer 5 points. If the participant unscrupulously checked the document, then such a package was not accepted, and the volunteer lost 10 points.
Translation difficulties
Checking posters turned out to be more difficult than proofreading programs. In old posters and programs, due to the small and blurry text, complex layout and print quality, the characters were not always correctly recognized.
For example, a volunteer spent the whole day checking this large, complex poster in 1936 with small print. Every third surname had to be searched on the Internet:

And on this poster the signature below is poorly visible:

Volunteers often came across old, tattered posters, some of which had to be entered manually. On this poster of 1883, the volunteer recognized only the title and the first two columns, because part of the document was not preserved:

Although FineReader knows the Old Russian language, it was unusual for participants to check pre-revolutionary posters and programs with their atypical presentation style and the long-forgotten letters “i”, “ѣ”, “ѳ”, etc. Nevertheless, the volunteers successfully coped with this task they wrote with humor in the comments: “After checking posters of 18 ** years, our hands are drawn and instead of“ actions ”write“ action ”...”.
In the photo - the program of 1910:

Technical support in touch
The project organizing committee around the clock answered the questions of volunteers by e-mail, in social networks and by phone. In the VKontakte group, volunteers asked a lot of questions and actively helped each other. It looked like this:

Participants also shared interesting details and unusual facts found in unique documents.

Under the spoiler, we collected other finds of volunteers.
The smallest poster, things forgotten at the theater and 40 ladies in suits of debarders







Winning ticket
As you remember, the Organizing Committee awarded points for each checked package. So the rating of volunteers was formed. The five most active participants received prizes - tickets to the Bolshoi Theater. The first place was taken by Igor Alimov from Belgorod, he checked 4 349 packages. The five winners also included Galina Zarina from Moscow, Alexander Aksenov from St. Petersburg, Natalya Klementyeva from Moscow and Larisa Ogorodnikova from Yekaterinburg. They chose productions of interest to them and attended the performances of Don Quixote, The Nutcracker, The Snow Maiden, Iolanta and the premiere of the ballet Romeo and Juliet.

Reviews of other winners of the first stage of the project can be read here and here .
In addition, the first ten active volunteers received ABBYY FineReader as a gift. And participants who checked at least one package received special diplomas:

Some statistics
The first stage of the project was attended by 4 thousand volunteers from 60 countries: USA, Australia, Brazil, India, China, Kazakhstan, Mongolia, many countries of Europe and, of course, Russia.

TOP 10 cities in which most of the volunteers live:
- Moscow,
- St. Petersburg,
- Kaliningrad
- Chelyabinsk,
- Novosibirsk
- Ekaterinburg,
- Permian,
- Samara
- Omsk
- Voronezh.
The project involved programmers, IT specialists, teachers, musicians, photographers, journalists, company executives, historians, retirees, students, artists, housewives, artists and many other professions.

Thanks to the volunteers, they managed to digitize and verify all the programs and posters in just 9 months. Programs and posters in JPEG and PDF formats with a text layer, as well as photographs in TIFF format, have already been transferred to the Bolshoi Theater Museum.

Now the second phase of the project is ongoing, in which 6,450 volunteers are already participating. They help to extract and organize data from digitized documents. At this stage, a whole range of ABBYY technologies is involved - from ABBYY Compreno to ABBYY FlexiCapture , and volunteers help test the work of artificial intelligence. We will explain in more detail how this works in the next article. In the meantime, you too can become a member of the volunteer project "Discover the story of the Big." Join now!
Elizaveta Titarenko, editor of the corporate blog ABBYY,
Marina Antropova, lead manager for special projects ABBYY