Features of hypothesis testing for mobile applications
How long does a hypothesis test take for a mobile application? Let's count:
- Development of an application that works in different modes for different user groups.
- Testing the result.
- Putting the application in the app stores and waiting for approval.
- Waiting for users to update the application. In 2019, most have auto-update turned on, but not everyone has it.
- Collection and analysis of statistics.
- Bringing the application to the state of the winning hypothesis, in parallel with the development of the following ...
If your developers work on Scrum with two-week iterations, this usually means that testing a hypothesis takes a full month. With other methodologies, this period can be shortened, but not significantly.
This state of affairs makes it impossible to achieve the rhythm of “5 hypotheses per week”, which many product teams strive for.
Below I will tell you how to speed up and improve this process and indicate a number of ready-made solutions that can be used.
Go.
Turn on and off
Before diving into details, you need to introduce an additional term - the Feature flag (Feature toggle) pattern .
For readers without a technical background, clarification may be required:
When developing some new feature, the programmer introduces a “switch” in the application code that activates this feature. Typically, this solution is used to keep unfinished features off in the general program code, but, of course, it can also be used to test hypotheses.
To use the Feature flag pattern in testing the experiment, you will need:
- Fully developed functionality on which to experiment.
- The switch for it is in the default state of “Off”.
- Remote switch control from the server.
The question is, what is the time saved if the functionality still needs to be developed before conducting A / B tests? Let's try to parse the stages of the experiment:
What do we see here?
Firstly, using Feature flag, we can upload the application to application stores before fully testing for errors. We just need to make sure that when the new functionality is turned off, the application behaves as before - and this can be done with previously written autotests. The rest can be tested while the application is distributed among users.
Secondly, after the experiment is completed, you can use Feature flag to turn on / off the functionality for all users until the next version is ready, where the flag will no longer be used.
It is by this principle that the Apptimize service works , providing a ready-made system for A / B testing.
Analyze
To conduct an experiment, you need to do several things:
- Select a user segment if the experiment is not for everyone.
- Choose audience size.
- Collect data, and not only those that are verified by experiment. The rest of the business metrics will be required to ensure that the experiment breaks nothing in other metrics.
- Collect and analyze the result.
If you do not use the ready-made solution from Apptimize, the simplest approach would be a combination of Google Analytics for Firebase for analytics and Firebase Remote Config for setting individual configurations (segments and tests). These tools are designed to work together.
Accordingly, you need:
- Use Google Analytics for Firebase to track business metrics.
- Use Firebase Remote Config to manage Feature flags.
- Use Firebase Remote Config to specify segments and experiment parameters.
- Analyze data from Google Analytics using the keys from Firebase Remote Config in the analysis. This feature is provided by these tools “out of the box”.
We will optimize further
We looked at how to shorten the hypothesis testing cycle for mobile applications, reducing the time spent on testing and disseminating the results of the experiment. But this approach does not allow you to get rid of the time for approval and distribution of the application. The goal of “5 hypotheses per week” with this approach is still not very realistic.
To speed up the experiments, you need to be able to develop and send new functionality without having to update the application. This can be achieved, you must use a dynamic user interface. However, this approach has problems:
On the one hand, there are technical limitations on the construction of the interface according to the settings received from the outside. Most mobile development frameworks use a declarative approach where it is impossible or very difficult.
On the other hand, the application store policy prohibits the downloading and execution of arbitrary code, since it can be used for functionality that violates the rules of application stores.
Another limitation is the amount of data transferred from Firebase Remote Config. It cannot be used to transfer the entire interface. It is optimal to store in it only the “key” of a specific version of the interface, and when changing this code, load the interface from a third-party service. By itself, it does not limit the choice of mobile development framework, but it requires additional implementation efforts.
The optimal solution is an approach in which only the user interface is dynamically built, and the business logic remains fixed. Since the vast majority of product experiments relate specifically to the interface, you can maintain a high pace of work. At the same time, experiments that require refinement of the business logic can be performed in parallel, according to the process described above with flags.
Technically, this approach is most easily implemented in a framework that has the following characteristics:
- A reactive, high-performance user interface that does not use a declarative approach by default.
- Support for Google Analytics for Firebase and Firebase Remote Config.
- A cross-platform solution is desirable to speed up development overall.
Optimally, the Flutter framework meets these criteria. As a proof-of-concept of this approach, for him there is a library that allows you to create a dynamic interface .
Using the dynamic interface created in Flutter, Google Analytics for Firebase and Firebase Remote Config, you can develop applications that can be compared to websites with ease of hypothesis testing.