
We develop an analytics system
This post opens a series of articles on the development of an analytical system for monitoring user actions. In the first article we will talk about how to collect the necessary data from mobile applications for android and ayos.
We are a Surfingbird advisory service. The more we understand the user, the more relevant recommendations we generate.
Google analytics, Flurry, Appsflyer - you can smear your existing analytics from head to toe. You can build a magnificent dashboard on which to display DAU, MAU, DNU, ARPU, K-Factor and a dozen more indicators - but all this will only be shadows on the walls of the cave. No analytics system answers your question WHYthe user left the application, what exactly provoked his departure, it only fixes the fact that the user left. You can’t even write him a farewell email) Therefore, we decided that in order to answer this and similar questions, we should know everything about users. In what sequence and at what interval, on which screens and which buttons did he press. How many seconds and what article he read before turning around, spitting and leaving. What is the histogram of reading an article. How much time was spent on each pixel and in which version of the A / B test. At some point, we realized that we needed Stalin.
First of all, we agreed on a data structure in which we want to transmit tracked events. This structure is the same for the web, mobile and, looking ahead - databases on the backend (yes, there are a lot of them).
Events consist of the following basic components
And:
The measure is count events. By default, it is equal to one, but can be used for pre-aggregations of events of the same type for which there is no need for analysis in time.
This is the basic set from which we build.
In turn, measurements can be represented, for example, by a certain set of values, for example:
You may notice that in the naming of dimensions, the ability to preaggregate the same values is also protected, to facilitate further analysis in OLAP. Those. Remaining flat at the level of data collection - it can be expanded into a two-level hierarchy at the level of Cuba.
If you look at a data model, for example, in android, then any event can be represented as the following class:
How does it look in the application itself?
Any action on any screen is recorded as an event.
The easiest way to consider a fragment of my session is in a tabular form.

I started the session, clicked on the surf, loaded page 5 of some text editors, read it for a few seconds, then switched to the popular tab, started reading why the iPhone is three times more expensive than the android. Hell, yes, it was last night, by the way, I didn’t understand why) The
same data will look something like this, but after processing in OLAP:

But that's not the point. The next task that needs to be solved is integration with other analytics systems (by the way, we found out who is lying and how much, but not about that now) and “packing” events into “bundles”
On Android, we pack 50 pieces in packages and, at the time of generation, add analytics to Google, for cross-checking:
A very limited part of the events are executed with basic authorization and sent immediately, all the rest are packaged in packets and sent either as they accumulate, or at the time of completion of the program - forcibly.
So it looks like "throwing events on android"
On ayos, we tried a slightly different logic:
Events also accumulate on the stack and at any subsequent request - an array of accumulated messages is attached to it by a steam locomotive. If more than 50 events accumulate, we force a request with the nop system method . Also, if the tracked event needs to be sent as soon as possible, you can force a nop request.
On the backend - events come to the module written in pearl, which actually decomposes the entries into the table. But this is not his only function, he also controls the integrity of the data. If suddenly an event arrives from the client that is not known to Stalin, he puts it in a separate plate, which is processed later, after the elimination of inconsistency (for example, after adding a new value to the corresponding enum)
During the implementation process, we discovered some oddities. For example, some of the events came in the distant future, some in the past. It is easy to guess that all these users were the happy owners of smartphones for android. But in general - everything was successful. The system regularly collects statistics and we barely have time to realize it.
In the following articles, we plan to dwell in more detail on the methodology for analyzing the assimilation of content, how to build a DWH / OLAP system from shit and sticks, as well as more in detail about farewell letters and what ridiculous results this leads to.
package Birdy::Stat::Stalin;
#
# Это Сталин, он всё про всех знает
# Кто и что делает, кто и с кем спит
#
# ########################################################
# ########################################################
#
# !######### #
# !########! ##!
# !########! ###
# !########## ####
# ######### ##### ######
# !###! !####! ######
# ! ##### ######!
# !####! #######
# ##### #######
# !####! #######!
# ####!########
# ## ##########
# ,######! !#############
# ,#### ########################!####!
# ,####' ##################!' #####
# ,####' ####### !####!
# ####' #####
# ~## ##~
#
# ########################################################
# ########################################################
We are a Surfingbird advisory service. The more we understand the user, the more relevant recommendations we generate.
Google analytics, Flurry, Appsflyer - you can smear your existing analytics from head to toe. You can build a magnificent dashboard on which to display DAU, MAU, DNU, ARPU, K-Factor and a dozen more indicators - but all this will only be shadows on the walls of the cave. No analytics system answers your question WHYthe user left the application, what exactly provoked his departure, it only fixes the fact that the user left. You can’t even write him a farewell email) Therefore, we decided that in order to answer this and similar questions, we should know everything about users. In what sequence and at what interval, on which screens and which buttons did he press. How many seconds and what article he read before turning around, spitting and leaving. What is the histogram of reading an article. How much time was spent on each pixel and in which version of the A / B test. At some point, we realized that we needed Stalin.
First of all, we agreed on a data structure in which we want to transmit tracked events. This structure is the same for the web, mobile and, looking ahead - databases on the backend (yes, there are a lot of them).
Events consist of the following basic components
- Action - the user's action, answers the question of what he did
- Screen - the screen answers the question on which screen
- ContentType - type of content, answers the question with what type of content was the interaction
And:
- userToken - who
- time - when
- clientVersion - in which version
- contentID - content identifier
- deviceID - device unique identifier
- deviceType - device type
- other dimensions - abstests, descriptions (for errors) and so on
The measure is count events. By default, it is equal to one, but can be used for pre-aggregations of events of the same type for which there is no need for analysis in time.
This is the basic set from which we build.
In turn, measurements can be represented, for example, by a certain set of values, for example:
//Действия
public enum Action {
none,
//хиты установки клики
install,//в момент запуска приложения незарегом
hit,//в момент открытия приложения
clickon_surfbutton,//клик по кнопке серф
clickon_volumebutton,//клик по кнопке громкости
//открытие лент
open_surf,//открытие ленты рекомендаций
open_feed,//открытие ленты подписок
open_popular,//открытие ленты популярного
open_dayDigest,//открытие ленты картина дня
open_profile,//открытие профилю
open_settings,//открытие настроек
open_comment,//открытие комментарияв
//блок регистрации/авторизации пользователя (начало/конец)
registrationBegin_vk,//done
registrationSignIn_vk,//done
registrationSignUp_vk,//done
registrationBegin_fb,//done
registrationSignIn_fb,//done
registrationSignUp_fb,//done
registrationBegin_email,//done
registrationComplete_email,//done
//страница
page_seen,//в андроиде пока не используется
page_click,//клик из какой нибудь ленты (8 штук)
page_open,//открытие страницы (откуда угодно)
page_read,//чтение страницы в секундах
//шаринги
share_fb,//done
share_vk,//done
share_sms,//done
share_email,//done
share_pocket,//done
share_copyLink,//done
share_saveImage,//done
share_twitter,//done
share_other,//done
//действия со страницей
like,//done
dislike,//done
favorite,//done
addToCollection,//done
//действия с пушами
openPush,//done
deliveredPush,//done
//and so on
}
You may notice that in the naming of dimensions, the ability to preaggregate the same values is also protected, to facilitate further analysis in OLAP. Those. Remaining flat at the level of data collection - it can be expanded into a two-level hierarchy at the level of Cuba.
If you look at a data model, for example, in android, then any event can be represented as the following class:
public ClassEvent (Action action, Screen screen, ContentType contentType, String contentID, String abTest1, String abTest2, String description, int count) {
this.abTest1 = abTest1;
this.abTest2 = abTest2;
//and so on
this.contentType = contentType;
this.contentID = contentID;
this.time = System.currentTimeMillis()/1000;
this.deviceID = SurfingbirdApplication.getInstance().getDeviceId();
this.deviceType = "ANDROID";
String loginToken = SurfingbirdApplication.getInstance().getSettings().getLoginToken();
this.userToken = loginToken==null?"":loginToken;
this.clientVersion = SurfingbirdApplication.getInstance().getAppVersion();
}
@Override
public String toString() {
JSONObject jsonObject= new JSONObject();
try {
jsonObject.put("clientVersion", clientVersion);
jsonObject.put("action", action.toString());
jsonObject.put("screen", screen);
jsonObject.put("contentType", contentType);
jsonObject.put("contentID", contentID);
jsonObject.put("time", time);
jsonObject.put("deviceID", deviceID);
jsonObject.put("deviceType", deviceType);
jsonObject.put("userToken", userToken);
jsonObject.put("abTest1_id", abTest1);
jsonObject.put("abTest1_value", abTest2);
jsonObject.put("description", description);
jsonObject.put("count", count);
} catch (JSONException e) {
AQUtility.debug("EVENTERROR",e.toString());
}
return jsonObject.toString();
}
How does it look in the application itself?
Any action on any screen is recorded as an event.
The easiest way to consider a fragment of my session is in a tabular form.

I started the session, clicked on the surf, loaded page 5 of some text editors, read it for a few seconds, then switched to the popular tab, started reading why the iPhone is three times more expensive than the android. Hell, yes, it was last night, by the way, I didn’t understand why) The
same data will look something like this, but after processing in OLAP:

But that's not the point. The next task that needs to be solved is integration with other analytics systems (by the way, we found out who is lying and how much, but not about that now) and “packing” events into “bundles”
On Android, we pack 50 pieces in packages and, at the time of generation, add analytics to Google, for cross-checking:
public void newEvent(ClassEvent.Action action,ClassEvent.Screen screen,ClassEvent.ContentType contentType,String contentId) {
registerEvent(new ClassEvent(action,screen,contentType,contentId));
}
public void newEvent(ClassEvent.Action action,ClassEvent.Screen screen,ClassEvent.ContentType contentType,String contentId,String abTest1,String abTest2,String description, int count) {
registerEvent(new ClassEvent(action,screen,contentType,contentId,abTest1,abTest2,description,count));
}
public void registerEvent(ClassEvent event) {
Tracker t = getTracker(
SurfingbirdApplication.TrackerName.GLOBAL_TRACKER);
t.setScreenName(event.screen.toString());
Map hits = new HitBuilders.EventBuilder()
.setCategory("event")
.setAction(event.action.toString())
.setLabel(event.action.toString())
.build();
t.send(hits);
if (TextUtils.equals("",event.userToken) || TextUtils.equals("null",event.userToken)) {
String eventsString = "[";
eventsString+=event.toString();
eventsString+="]";
events.clear();
aq.ajax(UtilsApi.eventsCallBackBasic(this, "some_method", eventsString));
}
else {
events.add(event);
if (events.size()>50) {
sendEvents();
}
}
}
public void sendEvents() {
if (events.size()>0) {
String eventsString = "[";
for (ClassEvent event: events) {
if (!eventsString.equals("[")) eventsString+=",";
eventsString+=event.toString();
}
eventsString+="]";
events.clear();
aq.ajax(UtilsApi.eventsCallBack(this, "nop", eventsString));
}
}
A very limited part of the events are executed with basic authorization and sent immediately, all the rest are packaged in packets and sent either as they accumulate, or at the time of completion of the program - forcibly.
So it looks like "throwing events on android"
SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.install, ClassEvent.Screen.none, ClassEvent.ContentType.none, "");
SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.openPush, ClassEvent.Screen.page_parsed, ClassEvent.ContentType.siteShort,shortUrl);
SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.registrationBegin_email, ClassEvent.Screen.start, ClassEvent.ContentType.none, "");
On ayos, we tried a slightly different logic:
Events also accumulate on the stack and at any subsequent request - an array of accumulated messages is attached to it by a steam locomotive. If more than 50 events accumulate, we force a request with the nop system method . Also, if the tracked event needs to be sent as soon as possible, you can force a nop request.
// переопределяем метод у наследника AFHTTPRequestOperationManager
- (void) POST:(NSString *)path
parameters:(NSMutableDictionary *)parameters
success:(void (^__strong)(AFHTTPRequestOperation *__strong, __strong id))success
failure:(void (^__strong)(AFHTTPRequestOperation *__strong, NSError *__strong))failure {
SBEvents *events = [SBEventTracker sharedTracker].events;
if (events.count > 0) {
parameters[@"_events"] = [events jsonString];
[[SBEventTracker sharedTracker] clearEvents];
}
[super POST:path parameters:parameters success:^(AFHTTPRequestOperation *operation, id json) {
//
} failure:^(AFHTTPRequestOperation *operation, NSError *error) {
//
}];
}
On the backend - events come to the module written in pearl, which actually decomposes the entries into the table. But this is not his only function, he also controls the integrity of the data. If suddenly an event arrives from the client that is not known to Stalin, he puts it in a separate plate, which is processed later, after the elimination of inconsistency (for example, after adding a new value to the corresponding enum)
Attention, the code on the pearl. Untrained people cause eye bleeding and premature pregnancy
package Birdy::Stat::Stalin;
use constant {
SUCCESS => 'success',
FAILURE => 'failure',
UNKNOWN => 'unknown',
CONTENT_TYPE_NONE => 'none',
};
sub track_events {
my $params = shift;
return unless ref $params eq 'ARRAY';
return unless @$params;
my ($s_events, $f_events, $u_events) = ([],[],[]);
foreach (@$params) {
my $event = __PACKAGE__->new($_);
$event->parse;
# раскидываем события по разным спискам
given ($event->status) {
when (SUCCESS) {
push @$s_events, $event;
}
when (FAILURE) {
push @$f_events, $event;
}
when (UNKNOWN) {
push @$u_events, $event;
}
}
}
__PACKAGE__->_track_success_events($s_events);
__PACKAGE__->_track_failure_events('failure', $f_events);
__PACKAGE__->_track_failure_events('unknown', $u_events);
}
state $enums = {
'action' => [qw/
install hit
open_surf open_feed open_popular open_dayDigest open_profile open_settings open_comment
registrationBegin_email registrationComplete_email
page_seen page_click page_open
share_fb share_vk share_sms share_email share_pocket share_copyLink share_saveImage share_twitter share_other
like dislike favorite addToCollection
openPush deliveredPush openDayDigestFromLocalPush
error page_read none
/],
'screen' => [qw/
none start similar
surf feed popular dayDigest profile settings
page_parsed page_image siteTag actionBar
actionBar_profile actionBar_page actionBar_channel
profile_channel profile_add profile_like profile_favorite profile_collection
/],
'deviceType' => ['IPAD', 'IPHONE', 'ANDROID'],
'contentType' => [CONTENT_TYPE_NONE, 'siteShort', 'userShort', 'siteTag'],
};
state $fields = [ sort (keys %$enums, qw/time deviceID clientVersion userId userLogin contentID shortUrl count description/) ];
sub parse {
my ($self) = @_;
my $event_param = {};
{
my $required = [keys %$enums];
my $optional = [];
# проверим что есть нужные параметры
# если какого-то нет, такое событие трекать нельзя
unless ( $self->_check_params($required) ) {
$self->status(FAILURE);
return;
}
# Эти параметры енумы, поэтому они могут иметь лишь определённые значения
# Если значение хоть одного параметра нам не известно,
# Запишем параметры события в другую таблицу, чтобы обработать потом, когда научимся
unless ( $self->_check_enum_params($required) ) {
$self->status(UNKNOWN);
return;
}
my $params = $self->_parse_params([@$required, @$optional]);
$event_param = {
%$params,
%$event_param
};
}
{
my $required = ['time', 'deviceID', 'clientVersion'];
my $optional = ['userToken', 'count', 'description'];
# contentID опционален, если contentType eq 'none'
push @{
$event_param->{'contentType'} eq CONTENT_TYPE_NONE ? $optional : $required
}, 'contentID';
# проверим что есть нужные параметры
# если какого-то нет, такое событие трекать нельзя
unless ( $self->_check_params($required) ) {
$self->status(FAILURE);
return;
}
my $params = $self->_parse_params([@$required, @$optional]);
$event_param = {
# если опциональных параметров нет, всё равно нужно их добавить
(map { $_ => undef } @$optional),
%$params,
%$event_param,
};
}
$event_param->{'time'} = Birdy::TimeUtils::unix2date(
$event_param->{'time'}
);
$self->status(SUCCESS);
$self->params($event_param);
return;
}
# вернёт hashref с запрошенными параметрами
sub _parse_params {
my ($self, $params) = @_;
$params = [] if ref $params ne 'ARRAY';
my $result = {};
foreach my $key (@$params) {
my $value = $self->params->{$key};
next unless $value;
$result->{$key} = $value;
}
return $result;
}
During the implementation process, we discovered some oddities. For example, some of the events came in the distant future, some in the past. It is easy to guess that all these users were the happy owners of smartphones for android. But in general - everything was successful. The system regularly collects statistics and we barely have time to realize it.
In the following articles, we plan to dwell in more detail on the methodology for analyzing the assimilation of content, how to build a DWH / OLAP system from shit and sticks, as well as more in detail about farewell letters and what ridiculous results this leads to.