Making a simple web service using the Yandex.Metrica API

    Hello!

    Not so long ago, Yandex opened Yandex.Metrica API for use. In this article, I’ll explain what it is for, how to use it, and briefly describe the differences from the Google Analytics API.

    In addition, I will show how to use this API to create a web service in which you can compare the current performance of the site with the past and see how the popularity of pages has changed over time:



    Briefly about the Metrica API


    The main difference between the Metrica API and the Google Analytics API is that it focuses on reports, not metrics. A programmer using GA should tell the service “I want to see visits from advertising sources broken down by goal 1, the number of visits, the bounce rate,” the user of Metrics will say “I want to see a report on the content”.

    The choice of focusing on reports, rather than indicators, corresponds to the concept of Metrics as a tool for ordinary users, not professionals. Using the Metrica API is really much easier.

    However, the current approach has its drawbacks. Firstly, you can only request reports predefined by Yandex programmers from the service. Secondly, since the report structure cannot be changed, each time you will receive an excess amount of information, which may affect the response time of the service.

    The metric is developing very fast (during the writing of the article I even managed to change the API a little bit), so I’m sure that soon it will be possible to generate reports just for the necessary indicators just like in GA, and the problems described above will disappear.

    Why is it needed?


    So why do you need a metric API? With it, you can do many interesting things, for example:
    1. Show real-time statistics on the site
    2. Integrate site statistics into your CRM
    3. Automate and simplify the work of employees

    Point one is a funny whistle-fake, which, however, may be of interest to advertisers of the site. You can directly display the most popular queries that come to the site, the traffic schedule, geographic regions from which users come and directly on the “advertising on the site” page.

    For example, here is how this is done on Habré ( http://habrahabr.ru/info/stats/ ):



    Point 2 (integration of statistics in CRM) is understandable without any explanation. Add to the internal information about the order its source, the buyer's region and sometimes even a specific advertising creative - this is the blue dream of any advertiser / analyst. After this is done, it will immediately become clear which advertising is effective and which is not, and at the same time rid the call center of at least a few extra questions to the user.

    Automating the work of employees (point 3) is important for those who place a lot of advertising, spend a lot of money on SEO and constantly monitor the effectiveness of this whole thing. Suppose that every week your employees process 40 reports from Yandex.Metrica. They spend 10 minutes on each report. This is 6 hours 40 minutes. And if you give them already processed documents, then these 7 hours can be spent on something really useful.

    The main advantage for the programmer


    After working with the GA API and its bulky XML format, I would like to separately emphasize another important point: Metric allows you to receive data in JSON! In my opinion, this is one of the most important competitive advantages over GA. All modern languages ​​can work with JSON out of the box, and thus there is no need for any additional libraries. Unlike Google with Metric, you can immediately sit down and go.

    This is very easy to verify, open a new browser tab and go to the following URL (you must be logged in to Yandex): http://api-metrika.yandex.ru/counters.json?pretty=1 .

    Congratulations, you just used the metric API. And you do not even need any additional programs to parse the server response.



    We make our own service based on the Metrica API



    So, to get a deeper understanding of the API, let's try to create an Internet service that extends the standard capabilities of the Metric. By default, it lacks one very important thing - a comparison with the previous period. This is a very convenient feature, thanks to which website analytics is becoming much easier. In GA, a period comparison looks like this:



    Let's try to do something similar for Metrica.

    Before you start creating reports directly, you must give the user the opportunity to select the counter whose statistics he wants to see. To do this, we must use the link that we already saw above ( http://api-metrika.yandex.ru/counters.json ). From the information that the server sends, we need to get two parameters: id and site. ID is the counter number, without which it is impossible to get any statistics, and site is the name of the site specified during registration.

    It should be noted that when creating the API you need to log in. This can be done in several different ways, which I will not describe in this article. For my service, I chose oAuth, because I already used it when working with Google services. As it turned out, the implementation of oAuth from Yandex is much simpler to use than the version of its overseas rival.

    So, we will create an interface for the user to select the counter and the period with which we will compare our data. On python, the code for requesting counters will look like this:

    class FetchCounters(webapp.RequestHandler):
        def post(self):
            token = cgi.escape(self.request.get('token'))
            counters = memcache.get(token)
            if counters is None:
                fetch_url = 'api-metrika.yandex.ru/counters.json?oauth_token=' + token
                result = urlfetch.fetch(url=fetch_url, deadline=3600)
                if result.status_code == 200:
                    counters = json.loads(result.content)["counters"]
                    memcache.add(token, counters, 3600) # TTL 3600 __seconds__
                else:
                    counters = 'Oops, looks like you don\'t have permission to access counters'
            self.response.out.write(json.dumps(counters))


    Since this service is not intended for production, we will save the list of counters in memcache for the user token, so as not to pull the server again. In reality, this is probably not worth doing on the Google AppEngine platform - memcache is relatively small.

    For the user, the interface will look like this:



    Next, we need to select the appropriate report from the available list. The Metrics API has the following report groups:
    • Traffic
    • Sources
    • Content
    • Geography
    • Demography
    • Computers

    To build a graph, you need to know the number of visits on each day of the period under consideration, therefore we need a group of reports “traffic”, and in it a report “traffic”. To get the necessary data, we will form a request of the following form:
    http://api-metrika.yandex.ru/stat/traffic/summary.json?id=XXXXXX&date1=YYYYMMDD&date2=YYYYMMDD&oauth_token=XXXXXX

    The id of the selected user of the counter is date1 and date2 - dates in the specified format. Any request to the metric API can be checked directly in the browser, so you can simply take the id of your counter and substitute it into this link. If you are logged in to Yandex services, you can omit the oAuth token.

    In response, the metric will return a report that contains a lot of different unnecessary information, we only need the date ("date") and the number of visits ("visits"):

    data1 = map(lambda x: { "date": self.format_date(x["date"]), "visits": x["visits"] }, json.loads(res1.content)["data"])


    Next, we compare the period selected by the user with the previous period similar in the number of days (for example, 1.06-7.06 will be compared with  05.24-31.05). To do this, we first calculate the length of the considered period in time:

    period = [datetime.strptime(cgi.escape(self.request.get('date_1')), "%Y-%m-%d"), datetime.strptime(cgi.escape(self.request.get('date_2')), "%Y-%m-%d")]
    rng = period[1] - period[0] + timedelta(1)


    And then subtract the length from the endpoints of our period:

    res2 = self.fetch_data(map(lambda x: x - rng, period))
    if not res2:
        return
    data2 = map(lambda x: { "visits": x["visits"] }, json.loads(res2.content)["data"])


    As a result, for each date from the user period, we will have the number of visits on this day and N days ago, and based on these data we can already build a schedule. I used Google Charts to build charts, because it’s easy to work with them, and the result looks pretty nice. The comparison chart looks like this:



    Now that we have data on days and visits, why don't we calculate the deviation from the average on each day. If you present this information in the form of a histogram, it will be easier to perceive than if you peer at the constructed chart.

    To do this, calculate the average value by dividing the amount of visits by the number of days, and then compare the value obtained with the value of each individual day. As a result, we get the following diagram:



    So, now we have diagrams for comparing traffic with the past, but to make our service really useful, you need to add something else.

    When evaluating a site, it’s often necessary to compare how the popularity of the pages has changed. For example, the fact that telescopes on Dobson’s stand do well this month does not mean that they also sold well in the past. Let's try to add a report in which you can simply and quickly see changes in page traffic.

    To do this, we will use the type of reports “content”, and in it the report “Popular”. This report contains information on the number of entries, exits and views. You can get the report data by clicking on the link http://api-metrika.yandex.ru/stat/content/popular.json?id=XXXXXX&date1=YYYYMMDD&date2=YYYYMMDD&oauth_token=XXXXXX&.

    Please note that this time a new parameter “per_page” appeared in the link. This is an optional parameter that tells the Metrics API how many records should be in the server response. By default, the server always returns 100 records, but in this case for us this is an excess value.

    Otherwise, the mechanisms for obtaining data are very similar.

    res1 = self.fetch_data(period, 20)
    if not res1:
        return 
    data1 = json.loads(res1.content)["data"]
            
    res2 = self.fetch_data(map(lambda x: x - rng, period))
    if not res2:
        return
    data2 = make_url_tuple(json.loads(res2.content)["data"])


    As a result, our service will look like this:



    Due to the fact that the right column shows how the position has changed in comparison with the previous period, it is very easy to understand the dynamics of the popularity of pages.

    Appendix


    You can play with the service you made at http://metrika-api.appspot.com
    The source code is available here: https://github.com/sakudayo/Hello-Metrics The
    documentation for the API metric is posted by Yandex at the URL: http: // api .yandex.ru / metrika / doc / ref / concepts / About.xml
    Google Charts Documentation: http://code.google.com/intl/en-US/apis/chart/

    Also popular now: