Using Characters and Scripts in Calendar Testing
Hello! My name is Evgeny Emelyanov, I am the head of the Mail.Ru Calendar project . Today I will tell you about how we pumped up testing of the calendar’s mobile applications using characters and scripts. Such testing is widely used in usability research and in the study of user interaction with the interface. We decided to apply similar techniques for the classic manual testing of mobile applications. At first, the team was skeptical, but the results were very positive, so we would like to share our experience with you.
Nota bene: in marketing and user-centric design, characters are fictional characters. They represent different groups of users, divided by geography, demography, behavior and habits. Marketers use characters to describe different market segments.
Characters can be useful in determining the goals, wishes, restrictions of consumers of a brand or product. They can help in making decisions about various new developments, changes in the current functionality, design. User characters are a representation of the goals and behavior of hypothetical user groups of a product. In most cases, characters are created from data obtained from surveys and user interviews. The data contains a description of behavioral patterns, goals, skills, capabilities and the external environment. To recreate a more realistic look, fictional small personal traits are added. More than one character is usually created for each product, but there should always be a main character who is a representative of the main target group.
The problem is overdueBefore we started using the characters, the testing process was broadly structured as follows:
- The developer is building a build.
- Smoke testing is in progress.
- The tasks performed in this build are checked.
- The entire block of functionality affected by the changes is tested.
- If the build can become a beta version, then full testing of all user cases is performed.
Verify the terms. In our case, “user case” is a small scenario for using the application. For example, adding an event with certain parameters. Problems arose in two directions. Firstly, the Yuzkes were mostly “synthetic”. Testers, for obvious reasons, brought data divorced from reality, more often tried to purposefully "break" the application. It needs to be done, but to break something is a separate user case. Secondly, complex bugs occurred at the junction of user cases or with a certain implementation of several user cases. Therefore, such bugs were more often found by users, not testers. This saddened us greatly.
Fictional friendsThe Calendar project is small, cozy and almost family-run. We try to communicate with users directly, bypassing technical support. Therefore, we well know our users, their main tasks and are aware of most of the problems. It was not difficult to make portraits of the main groups.
Regular user : adds calendars with events (holidays, sports, movies), birthdays, uses various reminders about them. Occasionally adds personal events or new birthdays.
Mobile : uses the calendar only on a smartphone - iOS or Android. Very rarely accesses the web interface. Events are mainly personal, both single and periodic.
Active: uses a web interface and one of the mobile clients - iOS or Android, often switches between two clients. Creates and edits many events and tasks, often invites participants or is himself a participant in events.
Technocrat : uses a web interface and clients on both mobile platforms. Creates many events and tasks. It uses non-standard approaches to tools, builds its own schemes for working with events and tasks.
Drunken master : a special kind, rarely found in nature, capable of wreaking havoc and destruction. He constantly confuses the buttons, drives a stream of consciousness into the forms, presses Submit ten times, sends out spam and tries in every possible way to break what he reaches.
These five user groups overlap user cases, but the scenarios are quite different (the set of user cases and the order in which they are executed). For example, fast and stable synchronization of data between clients is critical for Active and Technocrat and not particularly important for Mobile, since it only uses a mobile client. Based on these considerations, we compiled usage scenarios for each character based on existing user cases.
Next, we compiled several scenarios of interaction between the characters for group work on the calendar. An important point - all the scenarios, including group ones, were run on the representatives of groups within the company. Thus, we checked both the correctness of the choice of our characters and the proximity of the scenarios to real life. We used “corridor testing” on colleagues, but for the purity of the experiment, we also tried polling third-party users using mail and instant messengers. “Corridor testing” turned out to be more effective, because when communicating with loyal users, the corrective effect “help at any cost” arose. And users were silent about some points, adjusting to the expected, in their understanding, result.
Process has begunNow the application testing process looks like this:
- The developer is building a build.
- Smoke testing is in progress.
- The specific tasks in this build are checked.
- Character scripts using affected functional blocks are tested.
- If the build can become a beta, then all scenarios of all characters are tested in order of priority.
For example, I’ll give one of our test scripts for the character “Active”. The script is designed to be completed within an hour:
- I look at the mobile client for notifications of events at the beginning of the working day (for some I look at the detailed description);
- I transfer 1-2 events to another time;
- I add participants to one of the events in the web client;
- I create 1-2 events for the current day and 2-3 events for the current week;
- I create in the mobile client 2-3 tasks for any date;
- I put the status of "completed" on arbitrary tasks from the current list in the web client;
- in the offline mode in the mobile client I’m viewing events, creating a couple of new tasks, moving one of the events, going online, checking the synchronization (this we call imitation of trips in the elevator);
- I mark several tasks completed in the web client, transfer 1-2 tasks to another day, create 2-3 events on an arbitrary date, shuffle the event from the Day in History calendar into a social network.
In order not to bore the details, the script is slightly reduced and simplified. As for the order of testing different characters and their scenarios, then, having statistics on different user groups, we prioritize - the largest group is tested first. This is done in order, first of all, to stabilize the most popular functionality and more often release alpha and beta versions.
Spoon of nuancesNaturally, characters are not a pill for all diseases. There are problems associated with the use of the technique, and problems that the characters do not solve.
The main one is episodic redundancy and the imposition of user cases for different characters and in different scenarios. We are struggling with this, breaking up the user cases by the execution parameters. For example, we create events separately for the characters, and each has its own set of parameters, as close as possible to reality, covering all possible conditions for creating the event.
Another problem is that some user cases cannot be included in scripts, since they are either very rare in real life, or they are synthetic in character. We leave these user cases separately from the scripts and go through them with full testing of all the functionality.
ProfitBased on the results of working with characters and scripts, 3 updates of our mobile clients were released. It is impossible to refer to the improvement of statistics on finding offensive bugs by the users themselves, since the released versions are very different and it is not entirely correct to compare them with each other. But there are other equally important beneficial effects.
Firstly, testers are much more fun to work with characters. The testing process has become more diverse, understanding of the product and users themselves has grown. Instead of hunting for abstract bugs that are difficult to reproduce in real conditions, problems that are critical for the end user are caught first.
Secondly, testers also participate not only in functional, but also in usability testing. And now the number of opinions when discussing interfaces has been replenished by active users of the product with their arguments and vision.
Thirdly, the entry threshold for new testers has decreased. Recently, we conducted an experiment and took on the position of a tester a person with no testing experience. Apart from the “young fighter's course” on general issues, effective testing began about three days later.
We must not forget that developers also need to test their applications, and there are scripts for them. Niche products, like our Calendar, are faced with the problem of use by direct participants. Scenarios and characters do a great job of asking "why should I use this at all, I don’t have such a need." In our case, over time, developers begin to use the product in life, gradually moving away from the scenarios.
And the last - characters are used by us not only in functional testing. Thanks to the work done, we structured and brought to mind an important and useful tool that we can potentially use in marketing.
Yuri Vetrov (@jvetrau), Head of the Mail.Ru Group Interface Design Group:
Characters are a powerful tool for the designer and designer, which allows you to focus on the scenarios that are most in demand in the real use of the product. It was great to see that he found application in the process of testing the quality of implementation, and not just in relation to usability.
In an ideal situation, bugs fix everything all at once. But in life there is always a bunch of tasks that push back a complete and irrevocable bugfix - new functionality, urgent hotfixes, etc. Therefore, we need a good way to prioritize both when fixing bugs and in finding them. To use key scenarios for using the most important categories of users for this - that's it. This means that in the first place, problems that prevent users most often are found and fixed.
Prior to this, the characters were relied upon for expert evaluation of usability and user testing. Using them to check the quality of implementation is an interesting and fairly fresh approach. I have read a lot about modern methods of working on interfaces for a long time and have never heard of such a thing. So this is a great addition to the piggy bank of the product team.