Geolocation saga and how to make geo-web service on NGINX without a database engine and without programming

    Today we will raise a rather old topic about geolocation by IP address and a new one about fast web services without “programming languages” . We will also publish a ready-made image of the container so that you can deploy such a web service in 5 minutes.

    Our company is creating online parts stores on its own SaaS platform ( ABCP.RU ), and we also have several related projects, for example, the 4MyCar.ru spare parts search service .
    Like many other web projects, we at one time came to understand the need for geolocation by IP address. For example, now it is used on 4MyCar.ru to determine the region (when you first enter the site, the region is automatically set that way).



    Similarly, the selection of the store branch closest to the customer is made on the websites of customers of the ABCP platform.



    When the problem of geolocation first appeared before us, we were just beginning to study this issue. As a matter of fact, at that time, besides MaxMind databases , there were no special alternatives. We tried, played and abandoned. In real work, they used MaxMind GeoLite several times to filter out particularly annoying bots trying to stack our customers' sites

    (filtering by country in nginx was enough, primitive checking in if, see the documentation ngx_http_geoip_module ). Free databases did not give sufficient accuracy in RU, contained the names of cities in the Latin alphabet and therefore were not very suitable for other purposes.

    After some time, one of our employees discovered the excellent ipgeobase.ru site , which allows you to “download” geolocation databases for Russia and Ukraine, as well as use its XML web service through a simple http request. For example, switching to the 4mycar.ru website with the phrase “buy an oil filter in Uryupinsk” from the corresponding city resulted in approximately the following request to the web service http://ipgeobase.ru:7020/geo?ip=217.149.183.4 . The results included the names of the city and region in Russian, which was very convenient. In a very short time, work with the web service was involved in the code that identifies the nearest store branch. However, after launching in production, several problems were revealed:
    1) usually a request to a web service took some short time (hundredths of a second in normal condition from a data center in Moscow), but from the development office in the region the delays were already higher (about half a second);
    2) occasionally (according to our observations, at “peak hours”) this time was noticeably longer, which caused already unpleasant delays in responding to our customers;
    3) it just so happened that for the same client several times it was required to conduct geolocation, hence the question arose about caching geodata;
    4) with our non-optimal requests, we created a load on the ipgeobase web service, which was bad for the service owners;
    5) for other countries (not RU and not UA) geolocation did not work.

    To solve these problems, we quickly “put together a meeting”

    and got two main solution options: take the databases and write your web service (periodically download the ipgeobase databases, import into your database, send via http with caching, for example, to memcached) or do geodata caching in memcached or redis (request the data in ipgeobase and caching). Offhand, both options required quite a lot of such scarce man-hours of developers, and as a result, the third option was found: we slightly reduce the accuracy (we replace the last octet in the ip address with 0 and assume that the providers have one / 24 subnet in different cities too often) and do on our equipment a caching proxy on nginx with long caching time and short timeouts for requests to ipgeobase. This option turned out to be very effective, at times reduced the load on ipgeobase and geolocation time.

    After some time, we again needed geolocation in nginx (yes, again these bots, but now a lot from RU), so filtering by country according to MaxMind databases was not enough.

    It was urgently needed, so we used another geo module ( ngx_http_geo_module ) and derived the region number from the ipgeobase database into a variable. This was enough to “plug the holes”.
    Soon we came across the ipgeobase2nginx.php script , which created the base for nginx and, as a result, received human-readable information about the city in a variable. This data, as well as MaxMind data, could already be output to the logs or passed in the headers on the backend, which, in principle, suited everyone.

    All this time we periodically thought about further development. Plans to create our own web service were gathering dust in TODO lists and popping up occasionally in the form of “but I want to study python / erlang / haskell / etc here tonight to write next after 'Hello world'?”, But they didn’t move on.
    Suddenly, at first in the form of a joke over tea (just for fun), the idea came up, based on the experience available in nginx, to make a web service similar to ipgeobase, but without a database engine and using scripting languages.
    A quick analysis of what we have yielded the following result:
    1) GeoLite databases in csv and ipgeobase in the text are freely available;
    2) ngx_http_geo_module moduleknows how to set the values ​​of variables by IP address, and also does it awfully fast (even uses binary geo range base to speed up);
    3) for RU and UA we trust ipgeobase, but, if possible, we want to see MaxMind data as well;
    4) nginx perfectly implements ssi ( ngx_http_ssi_module ), not only for text / html, but also for other types of files;
    5) nginx can take the ip address from the request header and assume that this is the client's IP address ( ngx_http_realip_module ), which means passing it to the geo module.
    It remains to add a few “knee-high” scripts, which from the csv and ipgeobase files will make the required pieces of config for nginx.

    Here's what we got:
    https://yadi.sk/d/QsNN87nMesXo8- configs and scripts.

    In order to show the web service in operation, we temporarily deployed it to VDS, available at http://muxgeo-demo.4mycar.ru:6280/muxgeo/ .

    To quickly launch such a service on your own, you can download the finished LXC image - https://yadi.sk/d/1WrvV2RyesYFM (login: password - ubuntu: ubuntu).

    Here is a short description of how scripts work, in LXC we place them in / opt / scripts.

    In the / opt / scripts / in subdirectory you need to put the files obtained from MaxMind and ipgeobase and process them a bit (something like this):
    iconv -f latin1 GeoLiteCity-Location.csv | iconv -t ascii // translit> GeoLiteCity-Location-translit.csv

    To work, an additional file from MaxMind with the names of the regions is required:
    dev.maxmind.com/static/csv/codes/maxmind/region.csv

    Now the scripts themselves:
    GeoLite2nginx.pl - generates files out / nginx_geoip_ *
    ipgeobase2nginx.pl - generates files out / nginx_ipgeobase_ *

    We need to impose the IP address ranges in geoip and ipgeobase. To do this, the first two scripts, when executed, created files with an integer representation of IP addresses (out / nginx_geoip_num.txt and out / nginx_ipgeobase_num.txt). We manually made the file in / nginx_localip_num.txt, in which we put a list of reserved ranges (local networks, etc.). Additionally, exclude the range of multicast addresses from the resulting lists.

    How do we do it:
    The make-dup-ranges.pl script goes through the list and for each even ip (beginning of a new range) adds the previous one (end of the previous range) to the list, and for each odd one the next one. Then we sort the list, remove duplicates.

    The make-ranges.pl script creates such a config with ranges for nginx.

    Now we have configs for nginx, we need to connect them.

    Our scheme will consist of frontend and backend (frontend sends backend requests with header conversion and caching). We will do all this on ubuntu 14.04 in the LXC container, take nginx from the official site.

    We put the contents out here:
    / etc / nginx / muxgeo / data /

    Make “bindings” that set the necessary variables:
    /etc/nginx/muxgeo/muxgeo.conf
    /etc/nginx/muxgeo/muxgeo-geoip.conf
    /etc/nginx/muxgeo/muxgeo-ipgeobase.conf

    As well as the primitive logic for the backend:
    /etc/nginx/muxgeo/muxgeo_site.conf The

    configs for frontend and backend are here:
    / etc / nginx / conf.d / muxgeo-frontend.conf (listens on port 6280)
    /etc/nginx/conf.d/muxgeo-backend.conf (port 6299)

    We also need a file, let's say index.html, in which we we will output the data in the format we need using SSI in nginx.
    Let's place it in the / opt / muxgeo / muxgeo-backend / muxgeo directory.

    Thus, the request to
    http://muxgeo-demo.4mycar.ru:6280/muxgeo/?ip=217.149.183.4 is
    translated to the backend with the ip address being changed to 217.149 .183.4, and the backend will insert the information in the right places of the html text.

    But the html page is a bit not what we wanted, we need xml, like ipgeobase has. Just fill out the template with the output of the corresponding fields, see the example in the muxgeo.xml file Using the

    link
    http://muxgeo-demo.4mycar.ru:6280/muxgeo/muxgeo.xml?ip=217.149.183.4
    we get “the same, but better”, than ipgeobase has xml output, and even in utf-8 I

    need JSON - no problem. By analogy, the template is done:
    http://muxgeo-demo.4mycar.ru:6280/muxgeo/muxgeo.json?ip=217.149.183.4

    I want exotic ones - let's display in the ini-file:
    http: // muxgeo-demo .4mycar.ru: 6280 / muxgeo / muxgeo.ini? Ip = 217.149.183.4

    In order to test the work, you can, for example, create a geo-database at the addresses of all countries in a format similar to the result mentioned above (ipgeobase2nginx.php ). We will create a text file with a template (muxgeo_fullstr.txt) and a simple script that will read data for all available ranges.

    A small remark. In the examples, frontend and backend work on the same nginx. In the case of a heavy load, it makes sense to split them into different nginx, since the backend work with geodata consumes more memory than the minimal nginx with proxy_cache.

    What is the further development of this project? You can, for example, add other data sources, complicating the configuration a bit, and also connect your geo-bases to which you can place “refinements obtained from reliable sources :)”.

    Also popular now: