We develop an analytics system

    This post opens a series of articles on the development of an analytical system for monitoring user actions. In the first article we will talk about how to collect the necessary data from mobile applications for android and ayos.

    package Birdy::Stat::Stalin;
    #
    # Это Сталин, он всё про всех знает
    # Кто и что делает, кто и с кем спит
    #
    # ########################################################
    # ########################################################
    #
    #                 !#########       #
    #               !########!          ##!
    #            !########!               ###
    #         !##########                  ####
    #       ######### #####                ######
    #        !###!      !####!              ######
    #          !           #####            ######!
    #                        !####!         #######
    #                           #####       #######
    #                             !####!   #######!
    #                                ####!########
    #             ##                   ##########
    #           ,######!          !#############
    #         ,#### ########################!####!
    #       ,####'     ##################!'    #####
    #     ,####'            #######              !####!
    #    ####'                                      #####
    #    ~##                                          ##~
    #
    # ########################################################
    # ########################################################
    


    We are a Surfingbird advisory service. The more we understand the user, the more relevant recommendations we generate.

    Google analytics, Flurry, Appsflyer - you can smear your existing analytics from head to toe. You can build a magnificent dashboard on which to display DAU, MAU, DNU, ARPU, K-Factor and a dozen more indicators - but all this will only be shadows on the walls of the cave. No analytics system answers your question WHYthe user left the application, what exactly provoked his departure, it only fixes the fact that the user left. You can’t even write him a farewell email) Therefore, we decided that in order to answer this and similar questions, we should know everything about users. In what sequence and at what interval, on which screens and which buttons did he press. How many seconds and what article he read before turning around, spitting and leaving. What is the histogram of reading an article. How much time was spent on each pixel and in which version of the A / B test. At some point, we realized that we needed Stalin.

    First of all, we agreed on a data structure in which we want to transmit tracked events. This structure is the same for the web, mobile and, looking ahead - databases on the backend (yes, there are a lot of them).

    Events consist of the following basic components
    1. Action - the user's action, answers the question of what he did
    2. Screen - the screen answers the question on which screen
    3. ContentType - type of content, answers the question with what type of content was the interaction

    And:

    • userToken - who
    • time - when
    • clientVersion - in which version
    • contentID - content identifier
    • deviceID - device unique identifier
    • deviceType - device type
    • other dimensions - abstests, descriptions (for errors) and so on


    The measure is count events. By default, it is equal to one, but can be used for pre-aggregations of events of the same type for which there is no need for analysis in time.

    This is the basic set from which we build.

    In turn, measurements can be represented, for example, by a certain set of values, for example:

        //Действия
        public enum Action {
            none,
            //хиты установки клики
            install,//в момент запуска приложения незарегом
            hit,//в момент открытия приложения
            clickon_surfbutton,//клик по кнопке серф
            clickon_volumebutton,//клик по кнопке громкости
            //открытие лент
            open_surf,//открытие ленты рекомендаций
            open_feed,//открытие ленты подписок
            open_popular,//открытие ленты популярного
            open_dayDigest,//открытие ленты картина дня
            open_profile,//открытие профилю
            open_settings,//открытие настроек
            open_comment,//открытие комментарияв
            //блок регистрации/авторизации пользователя (начало/конец)
            registrationBegin_vk,//done
            registrationSignIn_vk,//done
            registrationSignUp_vk,//done
            registrationBegin_fb,//done
            registrationSignIn_fb,//done
            registrationSignUp_fb,//done
            registrationBegin_email,//done
            registrationComplete_email,//done
            //страница
            page_seen,//в андроиде пока не используется
            page_click,//клик из какой нибудь ленты (8 штук)
            page_open,//открытие страницы (откуда угодно)
            page_read,//чтение страницы в секундах
            //шаринги
            share_fb,//done
            share_vk,//done
            share_sms,//done
            share_email,//done
            share_pocket,//done
            share_copyLink,//done
            share_saveImage,//done
            share_twitter,//done
            share_other,//done
            //действия со страницей
            like,//done
            dislike,//done
            favorite,//done
            addToCollection,//done
            //действия с пушами
            openPush,//done
            deliveredPush,//done
           //and so on
        }
    


    You may notice that in the naming of dimensions, the ability to preaggregate the same values ​​is also protected, to facilitate further analysis in OLAP. Those. Remaining flat at the level of data collection - it can be expanded into a two-level hierarchy at the level of Cuba.

    If you look at a data model, for example, in android, then any event can be represented as the following class:

        public ClassEvent (Action action, Screen screen, ContentType contentType, String contentID, String abTest1, String abTest2, String description, int count) {
            this.abTest1 = abTest1;
            this.abTest2 = abTest2;
            //and so on
            this.contentType = contentType;
            this.contentID = contentID;
            this.time = System.currentTimeMillis()/1000;
            this.deviceID = SurfingbirdApplication.getInstance().getDeviceId();
            this.deviceType = "ANDROID";
            String loginToken = SurfingbirdApplication.getInstance().getSettings().getLoginToken();
            this.userToken = loginToken==null?"":loginToken;
            this.clientVersion = SurfingbirdApplication.getInstance().getAppVersion();
        }
        @Override
        public String toString() {
            JSONObject jsonObject= new JSONObject();
            try {
                jsonObject.put("clientVersion", clientVersion);
                jsonObject.put("action", action.toString());
                jsonObject.put("screen", screen);
                jsonObject.put("contentType", contentType);
                jsonObject.put("contentID", contentID);
                jsonObject.put("time", time);
                jsonObject.put("deviceID", deviceID);
                jsonObject.put("deviceType", deviceType);
                jsonObject.put("userToken", userToken);
                jsonObject.put("abTest1_id", abTest1);
                jsonObject.put("abTest1_value", abTest2);
                jsonObject.put("description", description);
                jsonObject.put("count", count);
            } catch (JSONException e) {
                AQUtility.debug("EVENTERROR",e.toString());
            }
            return jsonObject.toString();
        }
    


    How does it look in the application itself?

    Any action on any screen is recorded as an event.
    The easiest way to consider a fragment of my session is in a tabular form.


    I started the session, clicked on the surf, loaded page 5 of some text editors, read it for a few seconds, then switched to the popular tab, started reading why the iPhone is three times more expensive than the android. Hell, yes, it was last night, by the way, I didn’t understand why) The

    same data will look something like this, but after processing in OLAP:
    image

    But that's not the point. The next task that needs to be solved is integration with other analytics systems (by the way, we found out who is lying and how much, but not about that now) and “packing” events into “bundles”

    On Android, we pack 50 pieces in packages and, at the time of generation, add analytics to Google, for cross-checking:

        public void newEvent(ClassEvent.Action action,ClassEvent.Screen screen,ClassEvent.ContentType contentType,String contentId) {
            registerEvent(new ClassEvent(action,screen,contentType,contentId));
        }
        public void newEvent(ClassEvent.Action action,ClassEvent.Screen screen,ClassEvent.ContentType contentType,String contentId,String abTest1,String abTest2,String description,  int count) {
            registerEvent(new ClassEvent(action,screen,contentType,contentId,abTest1,abTest2,description,count));
        }
        public void registerEvent(ClassEvent event) {
            Tracker t = getTracker(
                    SurfingbirdApplication.TrackerName.GLOBAL_TRACKER);
            t.setScreenName(event.screen.toString());
            Map hits = new HitBuilders.EventBuilder()
                    .setCategory("event")
                    .setAction(event.action.toString())
                    .setLabel(event.action.toString())
                    .build();
            t.send(hits);
            if (TextUtils.equals("",event.userToken) || TextUtils.equals("null",event.userToken)) {
                String eventsString = "[";
                eventsString+=event.toString();
                eventsString+="]";
                events.clear();
                aq.ajax(UtilsApi.eventsCallBackBasic(this, "some_method", eventsString));
            }
            else {
                events.add(event);
                if (events.size()>50) {
                    sendEvents();
                }
            }
        }
        public void sendEvents() {
            if (events.size()>0) {
                String eventsString = "[";
                for (ClassEvent event: events) {
                    if (!eventsString.equals("[")) eventsString+=",";
                    eventsString+=event.toString();
                }
                eventsString+="]";
                events.clear();
                aq.ajax(UtilsApi.eventsCallBack(this, "nop", eventsString));
            }
        }
    


    A very limited part of the events are executed with basic authorization and sent immediately, all the rest are packaged in packets and sent either as they accumulate, or at the time of completion of the program - forcibly.

    So it looks like "throwing events on android"
           SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.install, ClassEvent.Screen.none, ClassEvent.ContentType.none, "");
           SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.openPush, ClassEvent.Screen.page_parsed, ClassEvent.ContentType.siteShort,shortUrl);
           SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.registrationBegin_email, ClassEvent.Screen.start, ClassEvent.ContentType.none, "");
    


    On ayos, we tried a slightly different logic:

    Events also accumulate on the stack and at any subsequent request - an array of accumulated messages is attached to it by a steam locomotive. If more than 50 events accumulate, we force a request with the nop system method . Also, if the tracked event needs to be sent as soon as possible, you can force a nop request.

    // переопределяем метод у наследника AFHTTPRequestOperationManager
    - (void)    POST:(NSString *)path
          parameters:(NSMutableDictionary *)parameters
             success:(void (^__strong)(AFHTTPRequestOperation *__strong, __strong id))success
             failure:(void (^__strong)(AFHTTPRequestOperation *__strong, NSError *__strong))failure {
        SBEvents *events = [SBEventTracker sharedTracker].events;
        if (events.count > 0) {
            parameters[@"_events"] = [events jsonString];
            [[SBEventTracker sharedTracker] clearEvents];
        }
        [super POST:path parameters:parameters success:^(AFHTTPRequestOperation *operation, id json) {
            //
        } failure:^(AFHTTPRequestOperation *operation, NSError *error) {
            //
        }];
    }
    


    On the backend - events come to the module written in pearl, which actually decomposes the entries into the table. But this is not his only function, he also controls the integrity of the data. If suddenly an event arrives from the client that is not known to Stalin, he puts it in a separate plate, which is processed later, after the elimination of inconsistency (for example, after adding a new value to the corresponding enum)

    Attention, the code on the pearl. Untrained people cause eye bleeding and premature pregnancy
    package Birdy::Stat::Stalin;
    use constant {
        SUCCESS     => 'success',
        FAILURE     => 'failure',
        UNKNOWN     => 'unknown',
        CONTENT_TYPE_NONE => 'none',
    };
    sub track_events {
        my $params = shift;
        return unless ref $params eq 'ARRAY';
        return unless @$params;
        my ($s_events, $f_events, $u_events) = ([],[],[]);
        foreach (@$params) {
            my $event = __PACKAGE__->new($_);
            $event->parse;
            # раскидываем события по разным спискам
            given ($event->status) {
                when (SUCCESS) {
                    push @$s_events, $event;
                }
                when (FAILURE) {
                    push @$f_events, $event;
                }
                when (UNKNOWN) {
                    push @$u_events, $event;
                }
            }
        }
        __PACKAGE__->_track_success_events($s_events);
        __PACKAGE__->_track_failure_events('failure', $f_events);
        __PACKAGE__->_track_failure_events('unknown', $u_events);
    }
    state $enums = {
        'action' => [qw/
                        install hit
                        open_surf open_feed open_popular open_dayDigest open_profile open_settings open_comment
                        registrationBegin_email registrationComplete_email
                        page_seen page_click page_open
                        share_fb share_vk share_sms share_email share_pocket share_copyLink share_saveImage share_twitter share_other
                        like dislike favorite addToCollection
                        openPush deliveredPush openDayDigestFromLocalPush
                        error page_read none
        /],
        'screen' => [qw/
                        none start similar
                        surf feed popular dayDigest profile settings
                        page_parsed page_image siteTag actionBar
                        actionBar_profile actionBar_page actionBar_channel
                        profile_channel profile_add profile_like profile_favorite profile_collection
        /],
        'deviceType'  => ['IPAD', 'IPHONE', 'ANDROID'],
        'contentType' => [CONTENT_TYPE_NONE, 'siteShort', 'userShort', 'siteTag'],
    };
    state $fields = [ sort (keys %$enums, qw/time deviceID clientVersion userId userLogin contentID shortUrl count description/) ];
    sub parse {
        my ($self) = @_;
        my $event_param = {};
        {
            my $required = [keys %$enums];
            my $optional = [];
            # проверим что есть нужные параметры
            # если какого-то нет, такое событие трекать нельзя
            unless ( $self->_check_params($required) ) {
                $self->status(FAILURE);
                return;
            }
            # Эти параметры енумы, поэтому они могут иметь лишь определённые значения
            # Если значение хоть одного параметра нам не известно,
            # Запишем параметры события в другую таблицу, чтобы обработать потом, когда научимся
            unless ( $self->_check_enum_params($required) ) {
                $self->status(UNKNOWN);
                return;
            }
            my $params = $self->_parse_params([@$required, @$optional]);
            $event_param = {
                %$params,
                %$event_param
            };
        }
        {
            my $required = ['time', 'deviceID', 'clientVersion'];
            my $optional = ['userToken', 'count', 'description'];
            # contentID опционален, если contentType eq 'none'
            push @{
                $event_param->{'contentType'} eq CONTENT_TYPE_NONE ? $optional : $required
            }, 'contentID';
            # проверим что есть нужные параметры
            # если какого-то нет, такое событие трекать нельзя
            unless ( $self->_check_params($required) ) {
                $self->status(FAILURE);
                return;
            }
            my $params = $self->_parse_params([@$required, @$optional]);
            $event_param = {
                # если опциональных параметров нет, всё равно нужно их добавить
                (map { $_ => undef } @$optional),
                %$params,
                %$event_param,
            };
        }
        $event_param->{'time'} = Birdy::TimeUtils::unix2date(
            $event_param->{'time'}
        );
        $self->status(SUCCESS);
        $self->params($event_param);
        return;
    }
    # вернёт hashref с запрошенными параметрами
    sub _parse_params {
        my ($self, $params) = @_;
        $params = [] if ref $params ne 'ARRAY';
        my $result = {};
        foreach my $key (@$params) {
            my $value = $self->params->{$key};
            next unless $value;
            $result->{$key} = $value;
        }
        return $result;
    }
    


    During the implementation process, we discovered some oddities. For example, some of the events came in the distant future, some in the past. It is easy to guess that all these users were the happy owners of smartphones for android. But in general - everything was successful. The system regularly collects statistics and we barely have time to realize it.

    In the following articles, we plan to dwell in more detail on the methodology for analyzing the assimilation of content, how to build a DWH / OLAP system from shit and sticks, as well as more in detail about farewell letters and what ridiculous results this leads to.

    Also popular now: