rumyash November 2, 2017 at 07:47

How do we make a card for those who make a card

30 million citizens use 2GIS products. To get a huge set of data to the end user, we use many internal products, which we rarely talk about.

Once on Habré there was already an article about our internal product - the vector editor of geometries . The wonders of naming led us to three points - Fiji. Before them, the project was called: “New card” → “New new card” → “New new new card”. Three years ago, we started implementing Fiji and talked about prototyping UI, today we will dive into the technical details and talk about how to create a fast and reliable GIS editor.

Cartographers and their requests

Fiji is a product in which our cartographers create a map. Want to find out what a typical cartographer’s day looks like? We developers see it something like this:

Most of the time, the cartographer interacts directly with the map that he creates. Responsive and fast map, allowing you to see the changes online - this is the task set by 500 cartographers working in 2GIS offices from Novosibirsk and Moscow to Prague and Santiago. Of course, we have SLA for all these operations - map navigation for a maximum of 3 seconds, map data update - 5 seconds.

How do we solve this problem?

Obviously, we have a database in which all geo objects are stored. The first thing that comes to mind is simply to pull out from it all the objects that the cartographer wants to see. This approach was used in the previous generation of our mapping system, when the database was separate for each city 2GIS, and the number of cartographers did not exceed a couple of dozen.

One of the main requirements for the new system was the ability to create maps of the whole world, and not its individual parts within the boundaries of large cities. The previous approach was ruled out, since geo-intersections at the base are a very expensive operation. For example, in order to get all the buildings in Moscow, it will take about two minutes, and if you take into account that the cartographer usually sees not one layer, but 10−20, then he would have to drink quite a lot of coffee while waiting for loading :)

Another the minus of this approach is the large amounts of data that the client pulls from the server. For example, Moscow buildings weigh more than 20 megabytes. The database is located in our data center in Novosibirsk, and the client may be in Chile. Between Novosibirsk and Chile ping 300 ms. With such indicators, the card immediately ceases to be responsive.

Raster tiles

The next option that we considered is the use of raster tiles. Nothing new, a very popular approach for loading a specific map extent. The whole world beats on several levels of scale (zoom level), each of which is divided into equal squares. As a result, we get a pyramid of tiles covering the world.

Pyramid of tiles.

So we are moving away from constant geo-intersections at every client request. In addition, bitmap images are much lighter than raw binary geometries. Tiles can be prepared once, distributed on distributed servers and periodically updated.

A variant has a right to exist, but it did not suit us, since at any moment each cartographer can change:

The set of displayed layers or their order. This means that you would have to have separate tiles for buildings, rivers, roads, and on the client to fence the logic of imposing tiles on each other in the right order.
Stylization of any layer. That is, to decide that the quarters should not be brown, but green with a red stroke. Then we would have to regenerate all the quarter tiles. Styling settings are individual.
Styling for individual objects by any condition. For example, make all houses above five floors red. With rasters this will not work.

Plus, when creating objects, cartographers use such a tool as “draw”, which allows you to automatically combine the boundaries of objects drawn next to each other. To do this, the client needs the real geometry of the displayed objects, and we would only have a picture.

The origin of vector tiles

We thought that since the whole world uses tiles, and we mainly need vector data, then why not combine these two entities into one, making the tiles vector. We also beat all our geodata onto tiles, but we store in them not pictures, but geometries and identifiers of objects that fall into the corresponding tile. Moreover, it is possible to store not all the geometry, but only the necessary part, cut off along the border of the tile.

The pros are obvious and cover all the disadvantages listed in previous approaches. The idea is cool, but for its implementation we had to go a long way and face a number of problems.

I would like to note right away that although some people consider our Earth to be flat, this is still not so :) Despite this, in the world of cartographers it is much more convenient to see a flat projection and work with flat coordinates.

We use EPSG as a projection : 3395 - WGS84 / World Mercator . It is on this projection that we create a tile grid with several levels. At the first level, we have one square cell in which the whole world is located, that is, it covers an area of approximately 40,000 by 40,000 km.

Tile grid of the first level

At the second level, divide our cell into four. At the next level, we divide each of the obtained cells into four more and so on.

Tile grid of the second level

In total we have 16 levels. Thus, at the last level, we get cells covering an area of approximately 1,200 by 1,200 meters. Further splitting will not give any tangible gains in tile sizes, but will lead to a significant increase in the number of tiles.

We use unique tiles for roads, buildings, rivers, neighborhoods. Due to this, only the types of tiles necessary for display at the moment are transferred to the client.

Each tile has its own unique address of the type: Object_type / scale_level / row / column /

Address allows you to very quickly generate requests for the tiles necessary to display by visible extent and scale, translating them into a buzzer, row and column of the tile grid. As mentioned above, this is much simpler than intersections of arbitrary geometries.

Another plus of vector data is that we can display it on any scale the user wants, at least one to one. This cannot be done with rasters, there is a rigidly fixed set of scales corresponding to the levels selected for the tile pyramid.

How does work with tiles work in Fiji?

We have a schematic picture of working with tiles like this:

Central database - all our objects created by cartographers are stored here. We use MSSQL 2016. At the moment, it has about 75 million geo objects and it weighs 450 gigabytes.

The map server is the “brain” of the system through which all business operations go through — creating, updating, deleting objects.

Tile servers are lightweight Java applications that can be deployed on almost any machine. The logic in them is extremely simple - at the request of the client, give the necessary tile, if it already exists. If not, then create a new one, give it to the client and save for the future. In addition, you need to periodically update the available tiles according to the information about the changed objects received from the map server.

We use PostgreSQL as a storage of tiles, a separate database for each server.

We have tile servers next to large groups of users - the European part of Russia, Novosibirsk, Vladivostok. Due to the fact that these servers are independent from each other, we can at any time exclude from distribution or add a new server.

Clients are desktop applications, each of which automatically selects the best tile server for it. Selection criteria: response speed and network bandwidth.

Tiles in the client are used only for display and geocoding. Geometries from tiles are not suitable for editing, as they can be greatly simplified, or cut off by the borders of tiles. Therefore, for editing, we simply get the entire object from the database by identifier.

To display tiles, we use our own render. We sat on someone else’s paid for a long time, tried various free options, but none of them satisfied our needs. As a result, they wrote their own, which supports rendering through DirectX and GDI +.

Tile Optimization

The less the tile weighs, the faster it reaches the client. We used several optimizations to reduce the weight of tiles:

The WGS84 projection operates with meters, but we are limited to an accuracy of one centimeter, so we can work with coordinates as integer values. Since the geometry of the object inside the tile consists of rather closely spaced points, it is more advantageous to store the coordinates of these points not in absolute form, but as displacements relative to the previous point. In each tile, the first point of the first object is stored in absolute coordinates, and all the others are stored as an offset from the previous point. This allows you to reduce the size of the tile by 8 times!
It makes no sense to display many types of objects on a small scale, for example, it makes no sense to show all buildings when we see a country on the screen. For each type of object, we determined the lower limit of tile visibility so as not to request them from the client and, accordingly, not to create them on the server.
At all visible levels except the last (sixteenth), simple generalization is used. Imagine that the maximum tile scale is an image of 256 by 256 pixels. Of all the points of the object that fall into the same pixel, leave one. The result will greatly disrupt the original geometry - a square house can turn into a point. It is unlikely that the cartographer will be pleased with the result without seeing an honest, non-generalized geometry when approaching one to one.
We use a bit flag when the geometry of the object completely covers the tile. This is true for large objects covering many tiles - areas, settlements and, of course, countries.

They coped with the task and quickly delivered geometry to the client.

Does it always work?

In an ideal world, always. In reality, geometry is not always enough to fully display an object. For example, the cartographer wants to see all sections of the road blocked on May 9, or just the names of the streets.

To solve this situation, all attribute information can be stored in a tile along with geometry. Most often, this is very redundant: buildings alone can have up to twenty attributes.

You can store only what you need for signatures, but the problem is that the set of necessary attributes changes unpredictably.

In addition to geometrical, we decided to make attribute tiles, for each attribute - our own set of tiles. The client himself determines which attribute tiles he needs, and requests them along with geometric ones.

What's next?

We have solved many non-trivial problems, but not all. Now all efforts are focused on the following problems:

Time for updating tiles for cities and regions leaves much to be desired. Now we just delete the old tiles and create new ones upon request from the cartographer. At these moments, the card slows down.
Tile server databases are different. This is due to the fact that groups of cartographers work with different parts of the map - Chileans do not edit the Far East. However, if they are transferred from the nearest tile server to Vladivostok, on which there are no tiles they need, the card will again start to slow down due to the generation of missing tiles.
The differences in the databases do not allow us in case of problems to simply copy the backup of a neighboring server.

To speed up Fiji, we are developing a separate server application for creating and updating tiles. It will be located next to the map server or a group of tile servers and will help to distribute tiles to the necessary tile servers.

So, if you want to make your own GIS editor, here are some tips:

Use raster tiles where only a static picture is needed and data is rarely changed. For example, building plans.
Wherever dynamic data mapping and real geometry may be needed, use a vector.
No matter how powerful your SQL server is, you should not assign all the work with geodata to it. If there is little data, then in the beginning everything may be fine. Do not be fooled - the load and growth in the volume of data will never stop.
Do not forget about optimizing the volume of data transmitted over the network. Try to find places where you can painlessly show not the original geometry, but its simplification.
Do not forget to relax - travel, walk, use maps so as not to get lost :)

Tags: