How Yandex crowdsourcing platform helps teach Alice and save money

    We continue to talk about how in Yandex and other large companies use crowdsourcing. In a previous post, we talked about drones and the quality of product search.

    Today you will learn about using Toloki for teaching Alice, updating the Handbook and moderating comments. All subtitles are clickable and lead to records of reports. Go!


    Work in the field: the collection and verification of information for Yandex. Reference

    Yandeks.Spravochnik - this is a huge database of organizations with contacts, photos, reviews and other data. To keep it up to date, you have to collect and process large amounts of information.

    Toloka does a good job with these tasks - an average of 50 thousand performers per month solve 15 million tasks of the Directory. Among them are the desktop, which are solved at home, and the field, requiring execution on the street.

    In the desktop Toloka, dozens of types of markup are performed for the Directory, such as moderating user photos or decrypting cafe and restaurant menus to search for places by dish.

    Not all organizations have phones and sites to clarify information remotely. To update the data on such organizations, the pushers take to the streets and perform tasks using a smartphone. The map shows fieldwork completed over the past few months, more than a million points.

    How Toloka helps Alice to be modern and witty

    Several million people talk to Alice every day. Everyone solves their problems: learns the weather, gets information or just talks. For Alice to understand and help everyone, she needs to learn to recognize speech, and this requires a lot of data.

    Toloka helps in collecting this data. For example, one of the tasks is to listen to the audio recording and decrypt it. Approximately 5 hours of audio recordings can be obtained in about an hour of work by tolokers.

    If you ask a person to recognize an audio recording, his mistake will amount to 5-6% of incorrectly recognized words. If you give one task to several performers, it is possible to choose the best option. The error in the total data can be reduced to 1-2%.

    To understand what the user said is not enough. It is necessary to answer correctly. Alice's answers have several aspects of quality. She must respond appropriately, do not appeal to the user on "you", do not be rude and do not talk about yourself in the masculine gender. All these metrics are represented as tasks on Toloka. Tolokers determine whether the response has one or another of the specified properties.

    But not always aspects of quality can be formalized. So, speech synthesis should be natural, with the correct intonation, without technical defects. These are subjective parameters that are difficult to imagine in the form of an evaluation model. Therefore, in Toloka, the performer is invited to listen to two versions of the same phrase and choose the best one.

    How to make everybody play by the rules on Yandex.Buses

    Yandex.Buses is a service that provides services for both passengers and carriers. Sometimes there are unscrupulous drivers who pick up passengers at bus stops, do not issue tickets to them, and take the money received. As a result, the carrier loses revenue, which is very noticeable on long routes.

    It is quite expensive to organize the work of controllers throughout the entire journey, for example, from Ufa to Moscow. Calling passengers and asking how many people were on the bus, whether the driver was sitting down on the way was ineffective. Another way is to install a people counter at the entrance to the bus. But in the long run, where there are many stops, people constantly enter and exit, which gives a tangible error. Each “lost” person is a potential loss of 2.5–10% of the flight's proceeds. In addition, the driver can still easily deceive the carrier by covering the sensor.

    The Yandex.Bus team came to the decision to attach a wide-angle IP camera to the router on the bus, periodically take a photo of the cabin and send it to the control room. So for each flight photos are accumulated, where you can see at what point how many passengers are in the cabin. By the way, all the faces of passengers are pre-algorithmically “blurred”. It remains to learn how to handle photos, that is, count the number of passengers. At this stage, a problem arose: the picture does not always turn out to be of high quality, since the shooting occurs in motion, often in the dark. In addition, there is only one camera in the bus; people do not always appear in the photo. It was not possible to find ready-made models that can count the number of people in such images, it would be too long to write one’s own.

    Developers turned to tolokery. Photos of the salon are sent to Toloka with the task of counting the number of people on them. The cost of the solution is less than 150 dollars. To calculate one flight, it takes 7 rubles.

    The experiment was carried out on four buses on 300 flights. It turned out that 9% of revenue was bypassing the carrier. Now more and more carriers of Yandex. The buses are connecting to this system.

    Hire 100,500 moderators and save: the Rambler Group experience

    Rambler Group develops more than 20 projects, including news feeds and thematic sites, users leave comments on each of them. This increases the time spent on the site and the depth of views, which is beneficial for the resource.

    But there is another side to the coin: the publication is responsible for the contents of the comments. To check them, you need a staff of moderators. Since comments appear constantly, moderators should work around the clock, which is expensive and quite difficult.

    In search of a solution, Rambler Group turned to Toloka. We first started the experiment: we chose 24,717 comments processed by regular moderators, and re-created the real flow of receipt of these comments to Toloka. One task included 10 comments, 3 minutes were allotted for their processing. To control the moderation quality, one task was offered to three performers. The cost set the minimum - 1 cent.


    The post-moderation system operates on the Rambler Group resources: any comment immediately hits the site, you need to remove the incorrect ones as quickly as possible. As it turned out, the talkers per minute process 10 comments, and the regular moderators - 12. In addition, the experiment showed that it is 60% more profitable to use the services of toppers than to have a staff of moderators for each publication.

    The experiment was considered successful, but slightly changed the conditions. One task is now offered to two performers, if their opinion diverges, they connect the third one. The number of comments in the assignment increased from 10 to 15. This reduced costs by an additional 35%.

    Using the API, comments are automatically sent to Toloka, moderated, and returned with a verdict. Now comments on all Rambler Group projects are moderated via Toloka.

    Also popular now: