i_user April 16, 2015 at 23:53

Organization and use of segmentation in large mobile applications

Once, your mobile application becomes large enough and ten thousand - one hundred thousand - one million use it every day, it doesn’t matter, in general there are a lot of live and different people. What does this mean for you as a developer?

Yes, now it’s become much worse to press the “Submit” button, because if you overlooked something, unlike web applications, you won’t be able to spend the night with red bull cans and pizzas and fix it - reviewing on mobile platforms takes time, and if we talk about iOS - as much as a whole week. A week is more than enough time for a previously loyal user to stop opening your application.

And, just as importantly, this means that the time has come when “I like the look of this screen” is not enough excuse for this screen to really be present in the application.

In this article I will try to talk about what we are doing so that the huge production application continues to remain so.

As a note: the material in this article is not very suitable for applications whose operation does not require an Internet connection. But in our mobile time - there are less and less of them.

Part 1. Segmentation of uber alles

First story: We integrated work with one large and very beautiful third-party into our application. They have their own team, their backend, even their office in some sunny country where you can call and make an appointment. But at one point, all this service lays down for three days after a solid refactoring, which, by amusing coincidence, violated backward compatibility. Yes, “write it yourself,” “don’t mess with them anymore,” “ask them to fix it as soon as possible” - these, of course, are interesting thoughts to think about, but you need to do something and moreover quickly so that users click on which - Either of the usual buttons did not see the constant “Sorry, we have lunch,” or worse - some information that is not very true.

The second story:You deploy a large and important feature and, of course, you tested it properly, but at the same time something turned out to be wrong in it! Whom and how to scold - we will decide later.

Third story: In order to know more about the users of your application, you send a lot of useful information to your dedicated log server. But who knew that suddenly, there were several million of them - and your log server happily crashes every 10-15 minutes, and a new one will arrive in 2-3 weeks.

These are scary stories, and there are many others that are not so scary.
And for all of them there is one convenient and useful tool - segmentation.
In short, the flow of our application can be described in simple steps:

Receiving from the server of the general config
Server login
Getting segmentation data
Work in the application

So, now let us dwell in detail on the first and third steps and on what problems they solved at certain moments of life:

What is a common config? This is a set of application settings, the same for all users.
What it contains:

Production Server Address

Entry-point of your application. And now, if it happens that you need to move to another domain or, for example, transfer users to the backup server, this can be done through the config. Unlike, again, from web applications for mobile (and desktop) systems, quite a large number of versions can be alive at the same time, which means that a forced change of the production server can be carried out (and we had it, it means it can happen with by you) without breaking customers who are already at work.

Thus, the application will store exactly one static URL - to the config file, and you can track it somehow.

Minimum version of the application (optional / force)

In general, people do not really like to be updated. And even with the advent of auto-updates on mobile platforms, the situation improved, but did not improve in its entirety. For example, with us - with two-week release cycles - as a rule, 10-15 versions are used at the same time. But sometimes, the so-called breaking changes occur - fundamental changes that make it impossible / uncomfortable to work on older versions of the client. In this case, this parameter signals us that “it would be nice to update” in a soft scenario and “it is impossible to continue working without an update” - in hard, which we show in the UI for users.

Log Settings

To do this, we use a separate config file, which allows us to:

Add / exclude fields sent in a specific log message.
Establish what percentage of users should specifically process these log messages to the server. (And, moreover, to which server - some messages are logged to third-party analytics services like GoogleAnalytics, some to our internal service, and so on)
Set the log level for the system log.

The first helps us, having fairly compact log messages - if necessary, add to them the information that we lack.

The second helps us balance the load on the log server. Since you already have a lot of users, even if 1 percent of them will send these messages to the server - this already allows us to get a fairly representative picture.

The third has a slightly different benefit - sometimes in our crash analytics system there are crashes, the circumstances of which are difficult to recover from stackTrace, then reducing the log level to the limit we just attach the last few lines of the system log to the crash report and the next day we can restore user actions and localize the problem (if you are interested, this is achieved with the cocoalumberjack logger and HockeyApp analytics).

Keys and other settings from third-party libraries

Yes, I understand that this is fufufu, but nevertheless, in cases of any ambiguous situations with these libraries - we had to re-create our third-party accounts several times - and this, again, did not break the work of old customers. In addition, these keys can be quite “salted” and made safe enough. (But we still remember that if the pest sets a target - it can also extract statically specified keys from the binary application - therefore, this method of storing them is not too worse - especially since it is more difficult to pull out the encryption algorithm than one key).
In addition, sometimes in the case of particularly unpleasant breaking changes in the third-party, you can afford to proxy the work with them through your server, preserving the old format for old clients.

Global A / B Options

Some decisions need to be made for a user who is not yet registered in the system (for example, the appearance of the registration window, the path along which we go through registration and much, much more). In this case, as such a parameter, we may well store the percentage of users for whom it is required to enable / disable some functionality. A unique device identifier (as a rule, you can find one in each operating system) from which they took a two-digit hash is not bad for determining whether they are in the test group.

In addition, the following concept turned out to be a very useful acquisition for us:
Devices and users are divided into “real” and “test” ones. To determine the second, we tried different ones (for example, by UDID, when its use was prohibited, by an identifier for advertising, but we also imposed restrictions on it, and indeed the use of such identifiers does not exclude conflicts that one day a real user will see something that should not), but in the end they settled on a simple scheme: a small utility, when launched on the device, writes some key to the encrypted storage on the device - this is on every mobile system, on iOS, for example, this is Keychain. The main application, on startup, checks for the presence of this key and, if available, considers the user to be a test.

IMPORTANT: If you introduce this kind of separation, try to observe two things:

A real user should not "accidentally" become a test. Seen information can confuse him.
Working under a test user does not give any preferences and the possibility of dangerous actions when working with the application. (Especially with regard to game development - because attackers who set themselves the goal of circumventing protection will be able to do this sooner or later and, moreover, share this way)

And now, when we had close users - why did we use them?

A debug panel that facilitates the actions of the tester. (Replaces the responses from the server, changes the log level, shows the frame rate and lags of the UI)
“Stages” - or test servers - for test users it is possible to choose from which start-up config we will load the application - from production, or from one of the test servers, which allows testing the application with various settings of both the server and segmentation.

So, now let's move on to the main segmentation config:
It focuses on more pragmatic tasks - what is generally accepted in the community as classic A / B.
This config is structured in the form of a certain dictionary (for example, JSON), in the format id_fici: {parameter dictionary}.
Here we no longer need any percentages and other branch points - as we recall, by this moment we are already registered in the system - and, accordingly, based on the obviously unique user_id, we return user-specific parameters based on the available segmentation module server. In it we can rely on:

How long has the user been using the application (newcomer, loyalty, ...).
Does this pay user.
Which device, which version of the operating system and, directly, the application the user uses.
In what time zone is the user in which country, what localization does he use.

And many other data.
What is it used for?

Enable / Disable Segmentation

As described at the beginning of the article, it’s quite scary to discover some great functionality, so in this way we can, firstly, disable a feature that does not behave as expected. And secondly, to ensure the gradual opening of this functionality to users (open for 5 percent - we observe, then for 10, 20, 50 and, finally, for everyone).

Parameter Segmentation

In each feature designed for A / V, you can lay a number of variables - text on a pop-up window, animation time, time between shows, button color, one of the options for possible behavior. The more of these parameters, the more experiments can be performed in search of the best solution. You are limited only by your imagination. On the other hand, the greater the amount of testing required for this functionality. (True, this volume can be well spread out over time - by testing the main working capacity of the application, in the future just conduct a short test on the test server of the set of parameters that are going to be rolled out)

So, summarizing this part, I would like to note the following points:

A / B testing should be built into the architecture - every feature designed, except for the very monolithic ones, should assume the possibility that it will be completely disabled. Extra branching is best removed when the functionality has already proven itself in production, during refactoring. In addition, the feature must have “embedded points” in advance - which can take some valid values that we get from the server.
In addition to the previous paragraph, there must be some default config for each feature, firstly, so that the forwarded config is the smallest in size (only redefined parameters), and secondly because segmentation, like everything else, can easily and just fall off.
Maintaining this kind of logic is really expensive - it complicates the code. Sometimes, significantly.
The volume of acceptance testing of the application increases - with design errors or simply related features, it is necessary to check the values of not one group of parameters, but possible interconnections of several.
With all this, it can make the application significantly more resistant to the mistakes of developers and ideologists of additional functionality.

Part 2. Hey, how are you, live?

Ask yourself if I'm not bullshit.

In the first part I wrote what we are doing, in the same I will try to describe - but how do we, in fact, understand whether we are doing.

Crash analytics.

The easiest way to understand that something is wrong is to find out that after the introduction of new functionality, the application just stopped working - it started to fall happily.
We use HockeyApp for it - because it has quite convenient tools for working with existing crashes, and in addition, it is well integrated into various deployment systems - so you can keep the information in it automatic. But, in fact, at the moment there are a very decent number of such tools - for every taste and color, choose for yourself. As I wrote a little higher - working with it has become even more pleasant since the moment we introduced the ability to attach a piece from the device’s log to the crash log.

Monitoring sessions and payments.

Perhaps the main tool of interest to the business. For him, there are quite a lot of different instruments, but we use some mixture of self-written and existing. Because not every business will go to share information on payments with third-party services. And rightly so. Existing systems allow us to quite comfortably detect the application quality behind the shortest and most long-term trend. When deploying significant observation functionality, the following metrics deserve:

The number of sessions — if it has changed dramatically — means something is wrong.
Revenue - or the number of payments - which allows us to understand in the short term - what effect the implemented functionality has on users, and strategically understand whether we are moving there at all.
Session length is also a very important metric. It can’t be said for her that she is always better when there is more (this is true for game projects), in business applications she should rather not be very far from some predicted. (If the sessions are too long, then perhaps you should think about what do users spend so much time?)
The number of sessions per day - well, here, in general, everything is clear.

Modern services make it possible to segment this analytics well (by device, by application version, by geolocation, and by a bunch of signs), which allows you to be more accurate in your forecasts.

Monitoring reviews.

Unfortunately, the process is very poorly formalized and not amenable to automation, but, I think, its importance and need to explain is not required for any (not only large) applications. It will be a good tone to give the opportunity to write some kind of feedback proposal directly to the company, because in a large number of reviews it may be missed.

Monitoring the life of features and stories.

For this, unfortunately, it was not possible to find a sufficiently convenient and functional simple public tool, which is why most of the application logs and a dedicated log server are dedicated to this.
We use Hadoop (and several other services) for this for the following reasons:

Being NoSQL - it makes it easy to add / remove fields to log messages without having to change the database structure on the log server - this gives the necessary flexibility.
It is possible to make full-fledged statistical samples for the set of parameters that interest us in order to obtain the most accurate cut of information.
We use such an entity as “history”, it is also an internal session, it is also funnel - relatively speaking, a unique identifier for the session using the application.
Based on this, we can get a sample showing the entire sequence of user actions within the session of interest to us. (For example, we make a selection of users for whom for some reason the payment failed, then for any of these users we select its funnel and get the necessary context of why this could happen)
It is possible to set up “notifications” suspended on some time scripts — that is, for example, once an hour we make a sample that considers the percentage of the number of sessions with errors as the total number of sessions — and if this percentage exceeds a certain interval, interested parties receive a message about this and proceed to a more detailed analysis of the problem.

I believe that all these requirements are quite important, and I believe that there is more than one tool that satisfies them.

And, perhaps, the most important point following from the availability of such tools for you is that it provides the ability to build formal (and fairly accurate) metrics to assess the quality and relevance of functionality. If the button is not pressed, it must be removed, even if it is very beautiful. If after the introduction of a super-convenient feature, users began to complain more about the application, use it less, pay less, then this feature is not super-convenient. Etc.

Summarizing all of the above, I would like to note the following several points:

Perhaps most of the material in the article seems obvious, but nevertheless we didn’t get to all of this at the first attempt, and this set of approaches works and solves problems.
All this infrastructure is very expensive in the development and maintenance of health - so you should not implement it EVERYTHING if you start writing a small application that, as you believe, will interest millions. At least also because statistical methods are absolutely inapplicable to some audience size.
However, at the same time, some recipes are easy to maintain, and they save nerves decently.
It’s not worth logging straight all-all-all. It quickly turns into indigestible porridge. I suggest logging based on hypotheses. That is, when you start designing new functionality, you write down options as a pessimist that may go wrong, and then on the basis of this write a minimal set of log messages with a minimum set of fields that will allow analytics to cover these hypotheses. And sometimes conduct a review of log messages and remove the lost relevance.
But at the same time, remember that you can log very diverse metrics and events - from errors in the session and the frequency of opening a certain screen, to the time of loading the application or the time spent on a particular task.
Try different tools for organizing segmentation - ideally, this should be done so that it is not programmers who are involved in the changes in the segments, but marketing and product teams. Give them more options for organizing segments.
There are a number of ready-made solutions on the market for organizing A / B segmentation in mobile applications, if you want to save time on organizing infrastructure, for example leanplum
You are very lucky, you are working on a project that is interesting to people :-) Thank you all.

Tags: