How we added public transport schedules in 2GIS



    2GIS helps to navigate in the city. You open the application, enter the name of the street or organization in the search, find, rejoice. After the necessary organization is found, a reasonable question arises: how to get there? And if we recently paid considerable attention to automobile scenarios, then the search for directions on public transport turned out to be a bit forgotten. I will talk about how the search for directions was created, I will share the intricacies of the collection and processing of information.

    Where did the task come from


    We love chatting with users. At the end of 2016, we conducted a survey in order to find out how our users use public transport. The result was curious - share.



    How often do you use public transport? »

    In general, in all cities where 2GIS is present, more than half of respondents use public transport every day. The larger the city, the more people use public transportation daily. On weekdays, it is most popular with residents of Moscow and St. Petersburg, and in other cities, people actively use public transport on weekends.

    With the frequency of use sorted out, time to see what types of public transport are the favorites of the townspeople.



    What modes of transport do you prefer?”»

    Regarding the top positions, the result is quite expected. In large cities, the preferred mode of transport is the metro. Second place for buses. More than a third of Muscovites use the train. In Novosibirsk they ride minibuses, and in St. Petersburg, among other things, they love trams.

    An interesting discovery was that half of those surveyed could reach their destination on foot.

    The next step was to clarify the weaknesses of 2GIS. We came to users with a question - what are we missing?



    What is missing in 2GIS? "

    We solved the problem with the choice of specific modes of transport using recently released filters, where the user can specify which type of transport he wants to use. But the question “When will the bus arrive?” Remained relevant for 64% of users surveyed.

    It was at that moment that we thought about adding a timetable for public transport to 2GIS and how to realize all our desires.

    Where to get the data?


    This is the first issue we have encountered. Indeed, in most cases for our product we collect data on our own. When a new microdistrict was built in the city, the information collection specialist travels to the declared place and verifies all the necessary details “in the fields”. With a new feature, the familiar approach would not work. To send specialists to stops is a waste of time and effort, as the schedule is constantly changing. New carriers appear, routes are optimized, winter is replaced by spring. During the collection, planned schedules and intervals would simply be out of date.

    Yes. Unfortunately, the data is outdated, and this was the second problem that confronted us. A logical and quite obvious decision was to turn to those who compile and control this schedule - to subordinate institutions. Often, a constructive dialogue began only through a thorny bureaucratic path and advice, such as: "Write to the Ministry of Transport / Deptrans, and then we'll talk."

    They found a person responsible, started a dialogue - half the trouble.

    Then the marathon began with:

    1. Explanations of what we want and why we need it.
    2. Beliefs that our idea is useful for residents and visitors of the city.
    3. Proof that 2GIS does not monetize route construction taking into account the frequency of movement.
    4. Assurances that it is safe.

    Victory? But no.

    The most curious thing is the technical side of the issue, and in particular, the data transfer format. Yes, in some cities there are automated systems for maintaining schedules and APIs for access to this data (gtfs or transfer in json in its own format), but this was far from everywhere. Somewhere they simply offered to parse the site, again, for security reasons, without providing access to the databases. Somewhere we were ready to send files (.xls, .doc, .pdf), but only once, without the ability to timely update the information in our directory.

    The first place in terms of originality, we assigned photographs of a piece of paper with a schedule of public transport.

    But initially the task seemed trivial - to get publicly available data from the source!

    Uploading data to the internal system


    Having accessed and uploaded data to ourselves, we faced yet another problem. You can’t just take and load other people's data into the internal system.

    Why?


    It's time to tell how the source data is stored inside 2GIS.
    We develop all internal products for collecting and storing information ourselves. The software for cartographers (which are responsible not only for the map, but also for transport) is called Fiji - a detailed story is here (in short, cartographers draw a transport graph in Fiji, enter data on public transport, store the schedule. All collected routes are already entered into the system )

    The first analysis showed that the routes within our system and suppliers differ, and in places - dramatically. It was necessary to somehow map your own routes and supplier routes. You can, of course, do it manually, but we decided to write our own matcher.



    As an intermediate format for storage, we choseGTFS , as a generally accepted standard, plus some vendors can issue schedules in this format. For the intermediate database on which the gamer is running, PostgreSQL was chosen, and the gamer was written in Python for simplicity.

    It just didn’t match by type and name of the route, since the routes very much differ in names from us and from suppliers. Match on the names of the stops did not work for the same reason. As a result, the gamer works according to a rather complicated scheme, taking into account the geometry of the route, the type of transport, and then the names of the stops and route numbers.

    At the same time, there are still errors in the comparison, since suppliers have a very large number of directions: separately for each weekday, separately for each weekend. There are also errors in comparing ring routes if they are set up differently by the provider and in our internal system (Fiji).

    Therefore, the final decision is still up to the person - the cartographer can manually cancel the schedule matching if he realizes that the algorithm worked incorrectly.

    Algorithm


    The core of the search algorithm is written in C ++. In fact, finding travel on public transport is not one algorithm, but several. The search for the passage to the nearest stops is considered to be our pedestrian routing algorithm, which, in turn, consists of two algorithms - the “ pixel ” (with which we construct the passage through the territory without a road graph) and the usual one (already along the pedestrian graph to the stop).
    As a search algorithm for driving between stops, we use a highly modified A * , to which we added support for schedule accounting. And if earlier the waiting time for the transport at the stop was a kind of “average” time for each project and each type of transport, now either the exact or interval schedule is taken into account.

    At the same time, the algorithm had to take into account many funny nuances in the data. For example, a route may have a departure time from a stop at 25 or even 47 hours. From the point of view of data, this means that this is the same flight that went in the previous days, and he just has not yet completed his work. It should also be borne in mind that the flight can begin to go “tomorrow” and if the user is looking for a route at the end of the current day, then you need to look in the next day (it is important if you keep the schedule by day).

    Separately solved the problem of how to combine data with a schedule and without a schedule. As a result, they decided that routes without a schedule are still involved in the search, they simply have less weight. Moreover, if a route without a schedule coincides on stopping platforms with a route with a schedule, then we simply glue it to the delivery, and if it goes somehow differently, then this will be a separate variant of travel with less weight, since we don’t have anything We know about the waiting time at the bus stop.

    Since 2GIS works both online and offline, the algorithm works both within the application and on the server. Despite the fact that the schedule data is more or less static, server search is also used here, since on slow devices, if the Internet is available, the request to the server will work much faster than local search. For server search, we use 8 search backends located in three data centers in Novosibirsk, Moscow and Dronten (Holland).

    Release date


    You can evaluate the final result of adding a public transport schedule to 2GIS in our mobile application on Google Play and the App Store . The web version will appear a little later.

    Having acquired it, we received quite a lot of feedback. After analyzing the negative, we identified two main causes of complaints:

    1. We did not properly tell users that the timetable is being used in the search for directions and we have broken the usual scenarios of working with the application.
      When searching for a route in the evening or at night, users lost the usual routes in the search results. The control of the choice of the date / time of the trip when building the route fell out of scope.

      Most of the technical support calls looked something like this:

      - Hello, your search for directions in public transport has become inadequate, because .... description of a specific problem.
      - You know, we issued a schedule in search of travel by public transport, here you can control, you can set the time for which the trip is planned.
      - I see, thanks a lot!
    2. The algorithm tried not to offer routes without a schedule (or omit them in the results) if there was an alternative with a schedule. Because of this, in some cases the issue has become less relevant.

      Our technologists had to urgently clarify and manually enter the interval schedule for all remaining modes of transport in order to return them to the issuance, and we had to further configure the search algorithms.

    Captain's conclusions


    What conclusions can be drawn from the launch?

    • If you plan to use other people's data in your application, be sure to think about how you will get these same data. Not the fact that your expectations and reality coincide. Consider the risks.
    • If you greatly change the current logic of the application, be sure to tell users about it and teach them how to use new features: “What's new” in the units read units.
    • Prepare tech support for a surge of calls no matter how well you talked about the new feature and how to use it.

    PS About completeness of data


    Since it was not possible to agree with all the Deptrans / Ministry of Transport, the schedule so far is available only in Moscow, St. Petersburg, Novosibirsk, Yekaterinburg, Krasnoyarsk, Omsk, Chelyabinsk, Krasnodar and Rostov-on-Don. We will increase the coverage of transport by timetable in these cities, just like adding new cities, as we receive data.

    Also popular now: