Dynamic pricing, or How Yandex.Taxi predicts high demand
Previously, to call a taxi, you had to call to different numbers of dispatching services and wait for the delivery of the car for half an hour or even more. Taxi services are now well automated, and the average car delivery time for Yandex.Taxi in Moscow is about 3-4 minutes. But it is necessary to go to the rain or end a mass event, and again we may face a shortage of free cars.
My name is Anton Skokurev, I lead a platform efficiency development team at Yandex.Taxi. Today I will tell Habr's readers how we have learned how to predict high demand and additionally attract drivers so that users can find a free car at any time. You will learn how to form a coefficient that affects the cost of the order. Everything is not as simple as it might seem at first glance.
Dynamic pricing task
The most important task of dynamic pricing is to provide the ability to book a taxi at all times. It is achieved by using the surge pricing coefficient by which the calculated price is multiplied. We call him simply "Surge." It is important to say that Surge not only regulates the demand for taxis, but also helps attract new drivers to increase supply.
If you set Surge too large - we will reduce the demand too much, there will be an excess of free cars. If you set it too low, users will see "no free machines." We must be able to choose the coefficient at which we will walk on thin ice between the lack of free cars and low demand.
What should this coefficient depend on? Immediately comes to mind the dependence on the number of machines and orders around the user. Now you can simply divide the number of orders by the number of drivers, get the coefficient and turn it into our Suj by some formula (possibly linear).
But there is a small problem in this task - it may be too late to count orders around the user. After all, an order is almost always a busy car, which means that the increase in our ratio will always be delayed. Therefore, we consider not created orders, but intentions to order a car - pins. Pin - this is the label "A" on the map, which puts the user launching our application.
We formulate the problem: we need to consider instant values of machines and pins at some point of the user.
We count the number of pins and cars around
When the position of the pin changes (user selects point “A”), the user application sends new coordinates to the backend and a small sheet of additional information that helps to estimate the pin more accurately (for example, the selected rate).
We try to adhere to microservice architecture, where each microservice is engaged in separate tasks. Surge is engaged in counting Surja. It registers pins, saves them to the database, and also updates the cast of pins in the RAM, in which they fit quite well. Cache lag in such work is only a few seconds, which is acceptable in our case.
A few words about the database
При регистрации каждый пин асинхронно складывается в MongoDb с TTL Index, где TTL – «время жизни» пина, при котором мы считаем его активным для подсчета повышающего коэффициента. Пользователь не ждет, пока мы совершаем эти действия. Даже если что-то пойдет не так, потерять пин не такая большая трагедия.
Hot cache is built with an index on geohashu . We group all the pins by geohash, and then collect pins for the desired radius around the order point.
We do the same with cars, but in another service called Tracker, which Surger just goes with the question “how many drivers are in this radius”.
So we consider the instantaneous values of the coefficient.
Case : you are standing in Moscow on the Garden Ring and want to order a car. In this case, the price jumps often enough and it is annoying.
Already knowing the mechanics, it can be understood that this may be due to the fact that drivers accumulate at the conditional traffic lights at the time of the request of Surge and also leave from there quickly. Because of this, the Surge and price can jump.
To avoid this, we cache the value of Surge by users. When a user comes for a Surge, we look at whether there is a saved value for the user in the allowed radius (linear walk through all the saved Surges of the user). If there is, we give it away, otherwise we count the new one and also save it.
It worked well, but there are other situations.
Case: 2 users are requesting surj. One orders 30 seconds later than the other when the cars from the traffic light from the past of the case have already left. We get a picture where 2 users ordering almost simultaneously can have a markedly different Surge.
And here we go from the cache by user to cache by position. Now, instead of caching the value of surdzh only by the user, we begin to cache it on the already familiar geohash. So we almost fix the problem. Why almost? Because there may be differences on the boundaries of geocaches. But the problem is not so significant, because we have anti-aliasing.
Perhaps, reading a case about a traffic light, the thought occurred to you that it was somehow unfair to count an instant Surge depending on the traffic light. We also think so, so we figured out how to remedy the situation.
We decided to borrow from the machine learning method of the nearest neighbors for the regression problem in order to determine how much the value of the instant Surge differs from what is happening around.
The learning stage, as in the formal description of the method, consists in memorizing all the objects - in our case, the calculated values of the Surge in the pin, we are already doing all this at the moment of loading all the pins into the cache. Things are going to be small - to calculate the instantaneous value, compare it with the value in the zone and agree that we cannot deviate from the value in the zone too much.
So we get a system with a quick response to events and allowing you to quickly read the value of the multiplying factor.
Surdz driver's card
In order to communicate with the driver, we need to be able to display a map of Surge in the driver's application - a taximeter. This gives the driver feedback on whether there is demand in the area where he is now and where he should go to get the most expensive orders. For us, this means that more drivers will arrive in a zone with high demand and settle it.
We live with the paradigm that a driver's device is a rather weak device. Therefore, the rendering of the hexagonal grid of Surge lies on the side of the backend. The client comes to the backend for the tiles. These are cut raster images for direct display on the map.
We have a separate service, which periodically takes the impressions of the pins from the Surger microservice and calculates all the meta information needed to render the hexagonal grid: where is the hex and what is the surj in each.
Dynamic pricing is a constant search for a balance between supply and demand, so that free cars are always available to users, including through the mechanism of attracting additional drivers to areas with high demand. For example, we are currently working on a deeper use of machine learning for calculating Surge. As part of one of the tasks in this area, we are learning how to determine the likelihood of pin conversion to an order and take this information into account. There is plenty of work here, so we are always glad to see new specialists in the team.
If you are interested in learning about some part of this large topic in more detail, please write in the comments. Feedback and ideas are welcome too!
PS In the next publication My colleague will talk about the use of machine learning to predict the expected time of arrival of a taxi.