Learning the climate of Russian cities with Python

    In Python, you can work with data and visualize it. Not only programmers use this, but also scientists: biologists, physicists, and sociologists. Today, together with shwars , the curator of our Python jumpstart for AI course , we will briefly turn into meteorologists and study the climate of Russian cities. From libraries to visualize and work with data we use Pandas, Matplotlib and Bokeh.



    We conduct the research in Azure Notebooks- cloud version of Jupyther Notebook. Thus, to start working with Python, we will not need to install anything on our computer and it will be possible to work directly from the browser. It is only necessary to log in with your Microsoft Account, create a library and in it - a new Python 3 laptop. Then you can take code snippets from this article and experiment!

    Get the data


    First of all, we import the main libraries that we need. Pandas is a library for working with tabular data, or so-called data frames, and pyplot will allow us to build graphs.

    import pandas as pd
    import matplotlib.pyplot as plt

    Source data is easy to find on the Internet, but we have already prepared data for you in a convenient CSV format. CSV is a text format in which all columns are separated by commas. Hence the name - Comma Separated Values.

    Pandas can open CSV files from the local disk or immediately from the Internet. The data itself is in our repository on GitHub , so we just need to specify the correct URL.

    data = pd.read_csv("https://raw.githubusercontent.com/shwars/PythonJump/master/Data/climat_russia_cities.csv")
    data



    Rename the table columns to make it easier to refer to them by name. We also need to convert strings to numeric values ​​in order to operate on them. When we try to do this with a function pd.to_numeric, we find that a strange error occurs. This is due to the fact that instead of a minus in the text is used the sign of a long dash.

    data.columns=["City","Lat","Long","TempMin","TempColdest","AvgAnnual","TempWarmest","AbsMax","Precipitation"]
    data["TempMin"] = pd.to_numeric(data["TempMin"])

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_numeric()
    ValueError: Unable to parse string "−38.0"
    ...
    ...
    ...
    ValueError: Unable to parse string "−38.0" at position 0

    An important morality follows from this problem: the data usually comes in a “dirty” form that is inconvenient to use, and the data scientist’s task is to bring this data to a good view.
    You can see that some of the columns in our table are of type objectrather than numeric type float64. In such columns, we replace the dash with a minus and then convert the entire table into a numeric format. Columns that can not be converted (city names) will remain unchanged (for this we used the key errors=’ignore’).

    print(data.dtypes)
    for x in ["TempMin","TempColdest","AvgAnnual"]:
      data[x] = data[x].str.replace('−','-')
    data = data.apply(pd.to_numeric,errors='ignore')
    print(data.dtypes)

    City              object
    Lat              float64
    Long             float64
    TempMin           object
    TempColdest       object
    AvgAnnual         object
    TempWarmest      float64
    AbsMax           float64
    Precipitation      int64
    dtype: object
    City              object
    Lat              float64
    Long             float64
    TempMin          float64
    TempColdest      float64
    AvgAnnual        float64
    TempWarmest      float64
    AbsMax           float64
    Precipitation      int64
    dtype: object

    Examine the data


    Now that we have clean data, we can try to build interesting graphs.

    Average annual temperature


    For example, let's see how the average temperature depends on latitude.

    ax = data.plot(x="Lat",y="AvgAnnual",kind="Scatter")
    ax.set_xlabel("Широта")
    ax.set_ylabel("Среднегодовая температура")



    From the graph it can be seen that the closer to the equator - the warmer.

    Record cities


    Now take a look at the cities that are temperature record holders, and see if there is a correlation between the minimum and maximum temperature in the city.

    ax=data.plot(x="TempMin",y="AbsMax",kind="scatter")
    ax.set_xlabel("Рекорд отрицательной температуры")
    ax.set_ylabel("Рекорд положительной температуры")
    ax.invert_xaxis()



    As you can see, in this case there is no such correlation. There are cities with a sharply continental climate, and just warm and cold cities. We will find cities with a maximum temperature range, that is, just cities with a sharply continental climate.

    data['spread'] = data['TempWarmest'] - data['TempColdest']
    data.nlargest(3,'spread')



    This time we took not the record figures, but the average of the warmest and coldest month. As expected, the largest variation is in cities from the Republic of Sakha (Yakutia).

    Winter and summer


    For further research we will consider cities within a radius of 300 km from Moscow. To calculate the distance between points in latitude and longitude, we use the geopy library , which must first be installed with help pip install.

    !pip install geopy
    import geopy.distance

    Let's add one more column to the table - the distance to Moscow.

    msk_coords = tuple(data.loc[data["City"]=="Москва"][["Lat","Long"]].iloc[0])
    data["DistMsk"] = data.apply(lambda row : geopy.distance.distance(msk_coords,(row["Lat"],row["Long"])).km,axis=1)
    data.head()



    Use the expression to select only the lines of interest.

    msk = data.loc[data['DistMsk']<300]
    msk



    For these cities we will build a schedule of minimum, average annual and maximum temperatures.

    ax=msk.plot(x="City",y=["TempColdest","AvgAnnual","TempWarmest"],kind="bar",stacked="true")
    ax.legend(["Мин","Сред","Макс"],loc='lower right')



    In general, there are no anomalies within 300 kilometers around Moscow. Ivanovo is located north of other cities, so the temperature there is a few degrees lower.

    Work with geo-data


    Now let's try to map the average annual rainfall with reference to cities and see how precipitation depends on the geographical location. To do this, use another visualization library - Bokeh . It also needs to be installed.

    Then we calculate another column - the size of the circle, which will show the amount of precipitation. The coefficient is chosen empirically.

    !pip install bokeh
    from bokeh.io import output_file, output_notebook, show
    from bokeh.models import (
      GMapPlot, GMapOptions, ColumnDataSource, Circle, LogColorMapper, BasicTicker, ColorBar,
        DataRange1d, PanTool, WheelZoomTool, BoxSelectTool, HoverTool
    )
    from bokeh.models.mappers import ColorMapper, LinearColorMapper
    from bokeh.palettes import Viridis5
    from bokeh.plotting import gmap

    To work with the map, you need the Google Maps API key. It must be independently obtained on the site .

    More detailed instructions on using Bokeh to plot graphs on maps can be found here and here .

    google_key = "<INSERT YOUR KEY HERE>"
    data["PrecipSize"] = data["Precipitation"] / 50.0
    map_options = GMapOptions(lat=msk_coords[0], lng=msk_coords[1], map_type="roadmap", zoom=4)
    plot = gmap(google_key,map_options=map_options)
    plot.title.text = "Уровень осадков в городах России"
    source = ColumnDataSource(data=data)
    my_hover = HoverTool()
    my_hover.tooltips = [('Город', '@City'),('Осадки','@Precipitation')]
    plot.circle(x="Long", y="Lat", size="PrecipSize", fill_color="blue", fill_alpha=0.8, source=source)
    plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool(), my_hover)
    output_notebook()
    show(plot)



    As you can see, the most precipitation falls in coastal cities. Although there are quite a large number of such cities, where the average precipitation is even lower than the national one.

    All the code with comments, written by Dmitry Soshnikov, you can independently see and run here .

    Results


    We showed the possibilities of the language without using complex algorithms, specific libraries or writing hundreds of lines of code. However, even with standard tools, you can analyze your data and draw some conclusions.

    Datasets are not always perfectly composed, so before you start working with visualization, you need to put them in order. The quality of visualization will largely depend on the quality of the data used.

    There are a huge number of various types of charts and graphs, and not necessarily limited to standard libraries alone.

    There is Geoplotlib , Plotly , Minimalistic Leather and others.

    If you want to learn more about working with data in Python, as well as get acquainted with artificial intelligence, then we invite you to a one-day intensive from the Binary District - Python jumpstart for AI .

    Also popular now: