ipgeobase in nginx

    When the task arises - to get the city and the tax (car) code of the region at the visitor’s address, it seems - but it’s easy, the internet is full of such things!
    And then you look: some are paid, others can’t be deployed, others can be, but it’s resource-intensive, fourth do not know anything about the regions of the Russian Federation ...
    And here the programmer’s sick brain hurries to the rescue with an obsession: “Do not do it for others, do it yourself”



    As soon as you start thinking in that way - nginx has an excellent geoip module, which is “not only fast, but also optimized to impossibility”. But this is bad luck, he does not understand any of the known database formats (MaxMind, Sypex, ipgeobase).

    A couple of hours in an embrace with a python and now there is a good converter that unhitches everything we need from the ipgeobase.ru site.
    (Yes, there were rumors that everyone was fired there for half a year already, but the databases are regularly updated, which cannot but rejoice)

    And so that there are no fears, I will comment on the code below (if you are not interested, you can immediately scroll to the settings)

    The code



    1. Download the database
    Nothing complicated here, requests + zipfile:
    archive = requests.get("http://ipgeobase.ru/files/db/Main/geo_files.zip")
    if archive.status_code != 200:
        error("IPGeobase no answer: %s" % archive.status_code)
    extracteddata = ZipFile(StringIO(archive.content))
    filelist = extracteddata.namelist()
    if "cities.txt" not in filelist:
        error("cities.txt not downloaded")
    if "cidr_optim.txt" not in filelist:
        error("cidr_optim.txt not downloaded")
    



    2. Download the dictionary of regions
    REGIONS = dict(l.decode("utf8").rstrip().split("\t")[::-1]
                   for l in open("regions.tsv").readlines())
    

    where regions.tsv is a list of automobile / tax codes of regions, of the form:
    66 Свердловская область
    77 Москва
    78 Санкт-Петербург



    3. Get the dictionary of cities
    For each city, we need to know its id, name and region code:
    CITIES = {}
    for line in extracteddata.open("cities.txt").readlines():
        cid, city, region_name, _, _, _ = line.decode("cp1251").split("\t")
        if region_name in REGIONS:
            CITIES[cid] = {'city': b64encode(city.encode("utf8")),
                           'reg_id': REGIONS[region_name]}
            if cid == "1199":  # Zelenograd fix
                CITIES[cid]['reg_id'] = "77"
    


    I note that here right away, with an eye to the future, utf-8 the name of the city is encoded in base64, to expand the possibilities of use (for example, in nginx logs), without the need to work with transliteration.


    4. Gluing address ranges and cities
    for line in extracteddata.open("cidr_optim.txt").readlines():
        _, _, ip_range, country, cid = line.decode("cp1251").rstrip().split("\t")
        if country == "RU" and cid in CITIES:
                database["".join(ip_range.split())] = CITIES[cid]
    

    Obviously, if the country is not Russia, then neither regions nor cities can be found in ipgeobase, and our tasks do not need such ranges.


    5. We generate files for the geoip module
    with open("region.txt", "w") as reg, open("city.txt", "w") as city:
        for ip_range in sorted(database):
            info = database[ip_range]
            city.write("%s %s;\n" % (ip_range, info['city']))
            reg.write("%s %s;\n" % (ip_range, info['reg_id']))
    



    Nginx setup



    For everything to work, you need to include the nginx.org/ru/docs/http/ngx_http_geo_module.html module in nginx geo ,
    put the generated files in a known place and add such a config to the http section:
    geo $region {
        ranges;
        include geo/region.txt;
    }
    geo $city {
        ranges;
        include geo/city.txt;
    }
    

    After such manipulations, two variables $ city and $ region will appear in nginx, which can be used anywhere:

    • even in the log:
      log_format long '$time_iso8601\t$msec\t$host\t$request_method\t$uri\t$args\t$http_referer\t$remote_addr\t$http_user_agent\t$status\t$request_time\t$request_length\t$upstream_addr\t$bytes_sent\t$upstream_response_time\t$city\t$region';
      
    • at least sending in application headers:
      location / {
          proxy_set_header  X-City     $city;
          proxy_set_header  X-Region $region;
          proxy_pass   http://backend;
      }
      

      At the same time, in the geo module, by default, all undetected addresses will return an empty string, then in this case the header will simply not be installed


    In fact, such a module works just instantly, does not load nginx, and due to the easy automation of updating the databases, it is pretty accurate (it all depends on trusting ipgeobase.ru databases). In this connection, there was a feeling that it might be useful to someone else. So I suggest using and maybe making converters to other data providers.

    code on GitHub (ipgeobase-importer branch)

    PS After some time, after writing the article, I rewrote everything on Go and added support for MaxMind

    Also popular now: