How Yandex.Taxi searches for cars when they are not
A good taxi booking service should be safe, reliable and fast. The user will not go into details: it is important for him that he clicks the Order button and receives a car as soon as possible, which will deliver him from point A to point B. If there are no cars nearby, the service should immediately inform about this so that the client doesn’t evolved false expectations. But if the “No cars” plate is displayed too often, it is logical that a person simply stops using this service and leaves for a competitor.
In this article I want to talk about how, with the help of machine learning, we solved the problem of finding cars in a territory with low density (in other words, where, at first glance, there are no cars). And what came of it.
Background
To call a taxi, the user takes a few simple steps, and what happens in the guts of the service?
User | Stage | Backend Yandex.Taxi |
---|---|---|
Selects a departure point | Pin | We launch a simplified search for candidates - search on the pin. Based on the drivers found, the arrival time is predicted - ETA in the pin. The increment factor at a given point is calculated. |
Selects destination, tariff, requirements | Offer | We build a route and calculate prices for all tariffs, taking into account the increasing coefficient. |
Presses the button “Call taxi” | Order | We launch a full-fledged search for the car. We select the most suitable driver and offer him an order. |
About ETA in the pin , we have already written the calculation of the price and the choice of the most suitable driver . And this is a story about finding drivers. When an order is created, the search occurs two times: on the pin and on the order. Search on the order takes place in two stages: recruitment of candidates and ranking. First, there are free candidate drivers coming along the road graph. Then bonuses and filtering are applied. The remaining candidates are ranked, and the winner receives an offer of the order. If he agrees, then assigned to the order and goes to the point of delivery. If he refuses, then the offer comes to the next. If there are no more candidates, the search starts again. This lasts no more than three minutes, after which the order is canceled - burned out.
The search on the pin is similar to the search on the order, only the order is not created and the search itself is performed only once. Also, simplified settings for the number of candidates and the radius of the search are used. Such simplifications are needed, because there are an order of magnitude more pins than orders, and the search is a rather difficult operation. The key moment for our story: if during the preliminary search on the pin there were no suitable candidates, then we do not allow making an order. At least it used to be.
Here is what the user saw in the application:
Search for cars without cars
Once we had a hypothesis: perhaps, in some cases, the order can still be completed, even if there were no cars on the pin. Indeed, some time passes between the pin and the order, and the search on the order is more complete and is sometimes repeated several times: during this time free drivers may appear. We also knew the opposite: if drivers were found on a pin, then it is not a fact that they will be found when ordering. Sometimes they disappear or all refuse the order.
To test this hypothesis, we launched an experiment: we stopped checking for the presence of machines during a pin search for a test group of users, that is, they had the opportunity to make an “order without cars”. The result was quite unexpected: if the car was not on the pin, then in 29% of cases it was later - when searching for an order!Moreover, orders without cars did not differ much from the usual ones in terms of cancellation rates, ratings and other quality indicators. The number of orders without cars was 5% of all orders, but just over 1% of all successful trips.
To understand where the executors of these orders come from, let's look at their statuses during the search on the pin:
- Free: was available, but for some reason did not get into the candidates, for example, was too far away;
- On order: he was busy, but managed to free himself or become available for ordering along the chain ;
- Busy: the ability to take orders was disabled, but then the driver returned to the line;
- Not available: the driver was not online, but he appeared.
Add reliability
Additional orders are great, but 29% of successful searches mean that in 71% of cases the user has been waiting a long time and as a result has not left anywhere. Although from the point of view of system efficiency this is not terrible, but in fact, the user receives false hope and spends time, after which he is upset and (possibly) stops using the service. To solve this problem, we learned to predict the likelihood that a machine will be found on the order.
The scheme is as follows:
- The user puts a pin.
- Searching on the pin.
- If there are no cars, we predict: maybe they will appear.
- And depending on the probability, we give or do not give an order, but we warn that the density of cars in this area is small at this time.
In the application, it looked like this:
Using the model allows you to more accurately create new orders, not to reassure a person in vain. That is, to adjust the ratio of reliability and the number of orders without machines using the precision-recall model. The reliability of the service affects the desire to continue to use the product, i.e., in the end, it all comes down to the number of trips.
A bit about precision-recall
One of the basic tasks in machine learning is the classification problem: assign an object to one of two classes. In this case, the result of the operation of the machine learning algorithm often becomes a numerical estimate of belonging to one of the classes, for example, a probability estimate. However, the actions that are carried out are usually binary: if we have a car, then we give it to order, and if not, then no. For definiteness, we call the model an algorithm that gives a numerical rating, and the classifier - a rule that refers to one of two classes (1 or –1). In order to make a classifier based on the model’s assessment, you need to select an assessment threshold. How exactly - is highly dependent on the task.
Suppose we do a test (classifier) for some rare and dangerous disease. Based on the test results, we either send the patient for a more detailed examination, or say: “Healthy, go home.” For us, sending a sick person home is much worse than examining a healthy person in vain. That is, we want the test to work for as many really sick people as possible. This value is called recall =. The ideal classifier recall is 100%. A degenerate situation is to send everyone for examination, then recall will also be 100%.
It happens and vice versa. For example, we make a testing system for students, and it has a cheating detector. If suddenly a check does not work for some cases of cheating, then this is unpleasant, but not critical. On the other hand, it is extremely bad to unfairly blame students for what they did not do. That is, it is important for us that among the positive answers of the classifier there should be as many correct ones as possible, possibly to the detriment of their number. So you need to maximize precision =. If operations begin to occur on all objects, then precision will be equal to the frequency of the determined class in the sample.
If the algorithm gives a numerical value of probability, then, choosing different thresholds, you can achieve different values of precision-recall.
In our task, the situation is as follows. Recall is the number of orders we can offer, precision is the reliability of these orders. This is how the precision-recall curve of our model looks:
There are two extreme cases: not allowing anyone to order and allowing everyone to order. If you do not allow anyone, then the recall will be 0: we do not create orders, but none of them will fail. If you allow everyone, then recall will be 100% (we will receive all possible orders), and precision - 29%, i.e. 71% of orders will turn out to be bad.
Suppose we do a test (classifier) for some rare and dangerous disease. Based on the test results, we either send the patient for a more detailed examination, or say: “Healthy, go home.” For us, sending a sick person home is much worse than examining a healthy person in vain. That is, we want the test to work for as many really sick people as possible. This value is called recall =. The ideal classifier recall is 100%. A degenerate situation is to send everyone for examination, then recall will also be 100%.
It happens and vice versa. For example, we make a testing system for students, and it has a cheating detector. If suddenly a check does not work for some cases of cheating, then this is unpleasant, but not critical. On the other hand, it is extremely bad to unfairly blame students for what they did not do. That is, it is important for us that among the positive answers of the classifier there should be as many correct ones as possible, possibly to the detriment of their number. So you need to maximize precision =. If operations begin to occur on all objects, then precision will be equal to the frequency of the determined class in the sample.
If the algorithm gives a numerical value of probability, then, choosing different thresholds, you can achieve different values of precision-recall.
In our task, the situation is as follows. Recall is the number of orders we can offer, precision is the reliability of these orders. This is how the precision-recall curve of our model looks:
There are two extreme cases: not allowing anyone to order and allowing everyone to order. If you do not allow anyone, then the recall will be 0: we do not create orders, but none of them will fail. If you allow everyone, then recall will be 100% (we will receive all possible orders), and precision - 29%, i.e. 71% of orders will turn out to be bad.
As signs, we used various parameters of the departure point:
- Time / place.
- System status (number of occupied cars of all tariffs and pins in the vicinity).
- Search parameters (radius, number of candidates, restrictions).
Details on symptoms
Conceptually, we want to distinguish between two situations:
- “Dead Forest” - there are no cars here at this time.
- “Unlucky” - there are cars, but there were no suitable ones when searching.
One example of “Unlucky” is if there is high demand in the center on Friday evening. There are many orders, there are many who want a lot, there are not enough drivers at all. It may happen this way: there are no suitable drivers in the pin. But literally in seconds they appear, because at this time in this place there are a lot of drivers and their status is constantly changing.
Therefore, various features of the system in the vicinity of point A turned out to be good features:
- The total number of cars.
- The number of cars on order.
- The number of machines unavailable for order in the "Busy" status.
- The number of users.
After all, the more cars are around, the more likely it is that any of them will become available.
In fact, it’s important for us not only to have cars, but also make successful trips. Therefore, it was possible to predict the likelihood of a successful trip. But we decided not to do this, because this value is highly dependent on the user and driver. CatBoost
was used as the model learning algorithm . For training we used data obtained from the experiment. After the implementation, it was necessary to collect training data, sometimes allowing a small number of users to place an order contrary to the decision of the model.
Summary
The results of the experiment turned out to be expected: the use of the model can significantly increase the number of successful trips due to orders without cars, but at the same time not to draw down reliability.
At the moment, the mechanism is launched in all cities and countries, and with it about 1% of successful trips occur. Moreover, in some cities with a low density of cars, the share of such trips reaches 15%.