How to create a map of Moscow parking with Kepler.gl
A few months ago, the first version of Kepler.gl was released - a new open source tool for visualizing and analyzing large geo-data sets.
In this article, I suggest you get acquainted with the main features of the application and create with it two map visualizations that will allow us to learn a few interesting facts about paid parking lots in Moscow.
But first, a few words about who created Kepler.gl and why.
Initially, Kepler.Gl was created by the Uber Engineering team for company analysts who wanted to better understand “how the city is moving”, using a huge amount of geo-information on traffic that is collected daily by thousands of Uber in various cities around the world.
However, in May of this year, the company announced open access to this application and posted all the source code Kepler.gl on GitHub
Main features of Kepler.gl
Regardless of the selected data analysis tools, mapping services or frameworks used, as well as libraries for creating various visualizations, the process of working on them comes down to 4 main steps:
- collection of information
- data processing
- research and analysis of prepared data (to identify dependencies, search for anomalies, etc.)
- creating visualizations
Figure 1. Basic steps of creating a visualization
Kepler.gl partially automates and simplifies 3 of the 4 recited stages, which considerably simplifies the entire process of analyzing and visualizing large data sets and helps create an informative, and importantly, colorful interactive map based on our own geo-data sets in just half an hour.
At the same time, programming or design experience is absolutely not required, because filtering and aggregation of data, choosing how to display data depending on various parameters of the objects under study, overlaying information from various sources, switching between 2D and 3D modes and much more is configured using the UI panel.
How to use Kepler.gl for data analysis
The easiest way is to start your acquaintance with Kepler.gl using its online version available at kepler.gl or, if you do not trust third-party servers, you can deploy a local version by following the instructions on GitHub .
Hereinafter, I will use the data on “Paid Parkings of Moscow” provided by the “Open Data Portal” of the Moscow Government. This set contains information about more than 9 thousand objects located on the street network, including information about the cost and number of parking spaces.
Stage 1. Loading data
To date, Kepler.gl supports 3 source data formats: geojson, json and csv. After saving the data in one specified format (in this example I use .csv), we simply load it into the application. By the way, here, in the download dialog box, you can also use one of a dozen pre-installed test data sets to familiarize yourself with the application.
Note. For Chrome, the maximum upload file size should not exceed 250Mb. The creators of Kepler.gl suggest using Safari if you need to download a larger file. However, in any case, you need to remember that the performance of the application depends on the device on which it is running. After all, all the manipulations associated with the aggregation, filtering and display of data occur on the client.
Stage 2. Displaying data on the map
The application supports 9 types of layers of data visualization, differing from each other by a set of customizable parameters:
- point layer (Point)
- arc layer
- line layer (Line)
- grid (Grid)
- hexagonal grid (hexbin)
- polygons layer (poligon)
- cluster layer (claster)
- icon layer (Icon)
At the same time, even layers of the same type, displaying the same data set, can dramatically differ depending on the selected configuration.
Figure 2. Maps created in kepler.gl using various types of layers
Kepler.gl does not limit the number of layers used when displaying a dataset of interest. Layers are drawn on the map in the same order in which they are located in the list of layers in the sidebar. This sequence is easily changed by simply dragging the respective layers relative to each other on the Layers tab.
When using multiple layers, pay attention to the “Layer Blending” parameter, which is responsible for how the layers overlap each other. It is uniform throughout the visualization, which makes it impossible to use different types of blending for different layers.
Three options are currently available:
In this case, the lower layers do not affect the color of the dots (or other elements) of the upper layers.
With this type of overlay, the color values of the matching elements are added together. It is useful for identifying areas of high density, which in this case will be brighter.
It differs from additive, does not add, but subtracts the value of colors in intersecting areas. Convenient when using not a dark, but a light card.
Thus, to see our data on the map, you need to create at least one layer using them. It should be noted that after downloading the file, Kepler.gl will try to identify the fields containing geolocation information and instantly display them by automatically creating layers of the appropriate types (usually point or polygon).
However, in our case, due to the difference in the expected and used data formats, we will have to specify the source of coordinates independently. To do this, first remove the polygon layers created by Kepler.gl, and then add a new layer of type Point manually. As a source of coordinates, we use the fields "Latitude_WGS84" and "Longitude_WGS84" instead of the field "Coordinates" , automatically selected by the application for rendering data on the map.
Figure 3. Using the point layer Kepler.gl to display parking lots in Moscow
In this embodiment, the map is not very informative. The only thing that can be said, looking at it, is that in the center of parking lots there is more than on the outskirts.
So, it's time to use other information about the objects under study for a more detailed analysis and search for interesting facts and / or patterns.
Stage 3. Modification of the appearance of the map based on the accompanying data about the displayed objects
The set downloaded from the Open Data Portal contains quite a lot of information about each of the parking lots, however, two parameters seemed to be the most interesting to me - the cost of an hour of parking and the number of available places.
Where in Moscow the most expensive parking? Is there a relationship between the size of the parking lot and its distance from the cent? How much does the cost of an hour of parking inside and outside the Garden Ring differ? To answer these questions, we need only slightly change the display settings of the previously created point layer and look at the map again.
To begin, change the color of the points, depending on the cost of an hour of parking in this place. To do this, in the drop-down list "Color based on" as the basis for choosing the color, we specify the "Price" parameter of the original data set.
Figure 4. Using color to display information about the cost of an hour of parking
Already at this stage, you can make some interesting observations. For example, that not the whole center is equally expensive for motorists, but on Tverskaya it's better to be a pedestrian
Now look at the capacity of parking. To do this, as a base parameter for determining the radius of a point (attribute “Radius Based On” of the point layer) we will use the “CarCapacity” field . Set the radius range to 0 to 30px.
Figure 5. Customization of the size of points depending on the number of parking spaces
Thus, our parking map literally in a few minutes became noticeably more informative. Now even a cursory glance at it allows not only to compare the pricing policies of various areas of the city, but also to estimate their chances of finding an empty space, taking into account not only the number of parking lots in the neighborhood, but also their capacity.
Step 4. Data Aggregation with Kepler.gl
Using a point layer to display each of the more than 9000 parking lots has already allowed us to make some interesting observations, however, the resulting map does not allow us to easily answer questions such as “Where is the most parking space per unit area?”. To answer it, we need to use one of the aggregating layers.
Currently, Kepler.Gl supports 4 types of such layers: grid (Grid), hexagonal grid (Hexbin), heat map (Heatmap) and cluster (Cluster). The last two types (Cluster and Heatmap) are useful when you need to aggregate data by only one parameter. The grid and the hexagonal grid allow you to analyze the aggregated values of several parameters simultaneously.
To answer the question posed earlier, we will change the type of the point layer previously created by us to a “grid” (Grid), this will allow us not only to estimate the total number of parking spaces per unit area, but also to keep information about the average cost of an hour of parking in this place.
Let's set the grid size 1 km2 (minimum available in Kepler.gl). The Coverage parameter value is reduced from 1 to 0.7 so that a small space appears between the cells, improving the readability of the final map.
Note. The list of parameters available for customization varies depending on the selected layer type. In more detail with the attributes supported by each of them, you can get acquainted in the official documentation Kepler.gl.
The color of each cell in the new visualization as before will depend on the cost of an hour of parking. However, now, apart from the name of the field in the dataset being used, we also need to specify how Kepler.gl will aggregate this information. Aggregation methods depend on the type of field selected. In our case, "Price" is a numeric type (int) and the application offers one of 5 options:
- highest value (minimum)
- lowest value (maximum)
- amount (sum)
- average (average)
The height of each grid column will reflect the total number of parking spaces in this area. To do this, go to the 3D-map view. Then, on the Layers tab of the sidebar, select “Enable height” for our aggregation layer, and select the “CarCapacity” field as the base parameter .
Figure 6. Generalized information about the cost and capacity of parking
Thus, having spent a few more minutes on setting up the aggregation layer, we can confidently assert that inside the Garden Ring not only the number of parking lots, but also the actual number of parking spaces is much more than outside it.
In this article, using a specific example, only a part of the capabilities of Kepler.gl was considered as a modern tool for visualization and basic analysis of various geo-data. If you are interested in this application, I recommend that you also get acquainted with the articles and tutorials below, as well as experiment on your own with filtering data, setting up tooltips and map styles and other features of this application.
And in the next article I will tell you about ways to share the visualizations and maps you created, as well as about using Kepler.gl as a React component for your web application.
- Kepler.Gl repository on github
- Detailed information about Kepler.gl from its creator on the Uber website : “From Beautiful Maps to Actionable Insights: Introducing kepler.gl, Uber’s Open Source Geospatial Toolbox”
- Interview with the developer Kepler.Gl Shen Hee