
How to measure the interface. Quantitative criterion
I periodically have a natural desire to find out which software is more convenient, or to prove the advantage of a product with the help of some adequate, numerical arguments (and not as usual).
Having become seriously interested in this topic, I searched for solutions for a long time, and even a year ago I wrote and defended my thesis “Determining a quantitative assessment of the quality of human-computer interaction”. I’ll talk about it in the article.
There are several ways to determine the usability of an interface.

There are many internationally approved questionnaires (SUMI, SUS, SEQ, etc.), which are lists from 1 to infinity with questions like “Did you find this system complicated for yourself”. In the text of the thesis you can see an overview of several of the most popular systems.

You can conduct a series of experiments with some numerical parameters (for example, the speed of the task, the number of errors or the average level of task completion).

You can invite some authoritative uncle and hope that in his 5/10/20 years of experience he learned something like that.
But these methods require hiring an expensive usability specialist (or even several), and catching users on the street.
Therefore, many developers, terrified of the prospect of interacting with end users, came up with ways to formally evaluate the complexity of the system.
* if it’s interesting to learn more about some methodology, a review with links to literature is in the text of the diploma
Based on the average values, they simply calculate how much time the average user would spend on basic tasks. True, it is unclear who determines which user scripts to calculate.
The visual complexity of the system is evaluated. Obviously, a program with ten panels and twenty buttons is more complicated than the Google start page. Naturally, the method is very relative, but simple and fast.
The ratio of the minimum amount of information needed to complete a task to the amount of information that a user must enter. For example, if an informational modal window with a single “OK” button is displayed, then the information entered by the user (clicking on the button) is absolutely useless, which reduces the indicator of information performance.
It is not very clear how to determine whether information is necessary or not for increasingly complex cases. Most likely, an expert is indispensable.
The more complex the code that describes the interface, the more likely the program is more difficult for the user.
A very controversial statement, most likely it works only for a very rough assessment (Photoshop is more complicated than the notpad, since there is more code).
On this last, so clumsily called method, I stopped.
Before describing the essence of the method, I immediately make a reservation that its scope is quite limited. When I only found out about him, he shone with a fabulous light, but in the process of working on a diploma, he somehow haggled a little and was blown away. Most likely, this method can be used for small application widgets, in which you can more or less highlight the main tasks and constant sequences of actions. It is not suitable for creative applications with supercontrol and unformalized results (Photoshop, AutoCAD, etc.).
I was based on the method of assessing the complexity of the Tim Comber and John Maltby system described in Comber T., Maltby JR Investigating Layout Complexity; in Proc. CADUI, 1996.
We denote the complexity of system C. In accordance with Claude Shannon’s theory of informational entropy, finalized by G. Bonsipe, the complexity will be determined by the formula

where N is the number of all objects,
p i is the ratio of objects in the i-th class to all objects (pi = ni / n),
n is the number of classes of objects,
n i is the number of objects of the i-th class.
For use in video display terminals, we assign the complexity to the location and sizes of objects:
C = C S + C D ,
where C S is based on the class-sizes of objects and C D on classes-relative arrangement.
Difficulty of an individual object:
CO = C / N
Of course, this is a primitive way. But this is a way that at least somehow tries to distribute objects by type. If the buttons are the same size and stand side by side, then everything is simple. If all controls are different, then the functions performed by them are also most likely diverse and specific.
The original algorithm involves evaluating only one, the main screen of the application. I will evaluate the complexity of the entire sequence of screens that a user must go through to complete his task. For each screen, an indicator of complexity is calculated, and if during the execution of the task some elements of the interface remain constant, a reduction factor is introduced for them.
To make the assessment more attached to life, I suggested introducing factors of significance of users and tasks.
Users are divided into several groups depending on their needs and tasks performed, a significance factor is set for each type of user.
It depends on:
- the number of users of a given type,
- the frequency of their use of the product,
- the cost of their time or the marketing significance of this type.

C uk is the complexity of the system for the k- th type of users
C tn is the complexity of the nth task
K tn is the coefficient of importance of the nth task for the user
After we get information about how difficult it is to use the interface for each group of users, we can calculate the total complexity. To do this, you need to summarize the difficulties for the types of users multiplied by weighting factors of user significance.
The main difference between this approach and the original methodology is that we rely on real people with real needs. A program cannot be abstractly complex; it can be complex for people whose tasks they don’t or do for a long time, and overloading with unnecessary information. In fact, the concept of complexity is redefined in accordance with the tasks.
In total, I took a technique that determined the complexity of the interface, measuring how many classes you can break all controls and how many controls in each class.
First, I proposed to single out several types of users, each, depending on the number, frequency of use and cost of time, to assign a coefficient.
For each type of user, highlight tasks with different significance factors.
For each task, create a sequence of screens and already count them according to the original method.
In the end, an assessment is obtained that reflects how easy the interface is for performing specific tasks by relevant users.
Weak point: you still need to highlight users and tasks independently. But once you figure out what types of users and what their tasks are, you can quickly and cheaply consider how alternative versions are worse or better. It works only for fairly simple programs.
Usefulness: it turns out not a vague expert comment, but numbers that can be compared.
In the diploma work, in addition to the special part, there is quite an interesting part with stories and stories about all sorts of horrors that occurred due to usap fakaps, a review of financial damage statistics that cause crappy interfaces and an overview of all sorts of ways to evaluate usability (carefully, albeit a little clumsily, collected and translated from a dozen sources).
You can see the content, and if you find something interesting, download the diploma itself .
Having become seriously interested in this topic, I searched for solutions for a long time, and even a year ago I wrote and defended my thesis “Determining a quantitative assessment of the quality of human-computer interaction”. I’ll talk about it in the article.
There are several ways to determine the usability of an interface.
Working methods
User surveys

There are many internationally approved questionnaires (SUMI, SUS, SEQ, etc.), which are lists from 1 to infinity with questions like “Did you find this system complicated for yourself”. In the text of the thesis you can see an overview of several of the most popular systems.
Live user experiments

You can conduct a series of experiments with some numerical parameters (for example, the speed of the task, the number of errors or the average level of task completion).
Expert

You can invite some authoritative uncle and hope that in his 5/10/20 years of experience he learned something like that.
But these methods require hiring an expensive usability specialist (or even several), and catching users on the street.
Therefore, many developers, terrified of the prospect of interacting with end users, came up with ways to formally evaluate the complexity of the system.
Techniques that do not require the participation of a specialist or users
* if it’s interesting to learn more about some methodology, a review with links to literature is in the text of the diploma
Determination of the average time required by the user according to the GOMS, KLM technique
Based on the average values, they simply calculate how much time the average user would spend on basic tasks. True, it is unclear who determines which user scripts to calculate.
Entropy RGB Profile
The visual complexity of the system is evaluated. Obviously, a program with ten panels and twenty buttons is more complicated than the Google start page. Naturally, the method is very relative, but simple and fast.
Information Productivity
The ratio of the minimum amount of information needed to complete a task to the amount of information that a user must enter. For example, if an informational modal window with a single “OK” button is displayed, then the information entered by the user (clicking on the button) is absolutely useless, which reduces the indicator of information performance.
It is not very clear how to determine whether information is necessary or not for increasingly complex cases. Most likely, an expert is indispensable.
XML tree parsing
The more complex the code that describes the interface, the more likely the program is more difficult for the user.
A very controversial statement, most likely it works only for a very rough assessment (Photoshop is more complicated than the notpad, since there is more code).
The number of classes into which interface objects can be divided
On this last, so clumsily called method, I stopped.
Before describing the essence of the method, I immediately make a reservation that its scope is quite limited. When I only found out about him, he shone with a fabulous light, but in the process of working on a diploma, he somehow haggled a little and was blown away. Most likely, this method can be used for small application widgets, in which you can more or less highlight the main tasks and constant sequences of actions. It is not suitable for creative applications with supercontrol and unformalized results (Photoshop, AutoCAD, etc.).
I was based on the method of assessing the complexity of the Tim Comber and John Maltby system described in Comber T., Maltby JR Investigating Layout Complexity; in Proc. CADUI, 1996.
We denote the complexity of system C. In accordance with Claude Shannon’s theory of informational entropy, finalized by G. Bonsipe, the complexity will be determined by the formula

where N is the number of all objects,
p i is the ratio of objects in the i-th class to all objects (pi = ni / n),
n is the number of classes of objects,
n i is the number of objects of the i-th class.
For use in video display terminals, we assign the complexity to the location and sizes of objects:
C = C S + C D ,
where C S is based on the class-sizes of objects and C D on classes-relative arrangement.
Difficulty of an individual object:
CO = C / N
Of course, this is a primitive way. But this is a way that at least somehow tries to distribute objects by type. If the buttons are the same size and stand side by side, then everything is simple. If all controls are different, then the functions performed by them are also most likely diverse and specific.
The original algorithm involves evaluating only one, the main screen of the application. I will evaluate the complexity of the entire sequence of screens that a user must go through to complete his task. For each screen, an indicator of complexity is calculated, and if during the execution of the task some elements of the interface remain constant, a reduction factor is introduced for them.
To make the assessment more attached to life, I suggested introducing factors of significance of users and tasks.
Users are divided into several groups depending on their needs and tasks performed, a significance factor is set for each type of user.
It depends on:
- the number of users of a given type,
- the frequency of their use of the product,
- the cost of their time or the marketing significance of this type.

C uk is the complexity of the system for the k- th type of users
C tn is the complexity of the nth task
K tn is the coefficient of importance of the nth task for the user
After we get information about how difficult it is to use the interface for each group of users, we can calculate the total complexity. To do this, you need to summarize the difficulties for the types of users multiplied by weighting factors of user significance.
The main difference between this approach and the original methodology is that we rely on real people with real needs. A program cannot be abstractly complex; it can be complex for people whose tasks they don’t or do for a long time, and overloading with unnecessary information. In fact, the concept of complexity is redefined in accordance with the tasks.
Briefly
In total, I took a technique that determined the complexity of the interface, measuring how many classes you can break all controls and how many controls in each class.
First, I proposed to single out several types of users, each, depending on the number, frequency of use and cost of time, to assign a coefficient.
For each type of user, highlight tasks with different significance factors.
For each task, create a sequence of screens and already count them according to the original method.
In the end, an assessment is obtained that reflects how easy the interface is for performing specific tasks by relevant users.
Weak point: you still need to highlight users and tasks independently. But once you figure out what types of users and what their tasks are, you can quickly and cheaply consider how alternative versions are worse or better. It works only for fairly simple programs.
Usefulness: it turns out not a vague expert comment, but numbers that can be compared.
References (abridged, no nonsense)
- Danilyak V.I. The Human Factor in Quality Management: An Innovative Approach to Ergonomic Management: A Training Manual. - M. Logos, 2011. Places - the book was written by a person who worked on the interface of an airplane control panel, there is a lot of water, but there is an interesting
- Sviridov V. A. The human factor . - nafanin.deda.ru/human-factor/human-factor-spreads.pdf - Uncle writes about the development of aircraft control systems. There is not much information, but how is an autobiography interesting and somehow touching or something?
- Randolph G. Bias, Deborah J. Mayhew Cost-Justifying Usability, Second Edition: An Update for the Internet Age, Second Edition. - Morgan Kaufmann, 2005. - About how much a bad interface costs. Interesting statistics
- Stickel S., Ebner M., Holzinger A. The XAOS Metric - Understanding Visual Complexityas measure of usability.– Work & Learning, Life & Leisure, Springer, 2010, pp. 278-290 - about automatic determination of complexity based on the number and variety of controls
- Bevan N. International Standards for HCI and Usability // International Journal of Human-Computer Studies.– 2001.– 55 (4) - all works and ideas are a bit vague, but the name of this Bevan is found in many references
- Bevan N. Measuring usability as quality of use // Journal of Software Quality Issue.– 1995.– 4, pp 115-140
- Sauro J. 10 Benchmarks For User Experience Metrics. - www.measuringusability.com/blog/ux-benchmarks.php - the whole site is interesting. The company has long been engaged in trying to measure usability. It doesn’t work out very well either, but Sauro himself seems to be publishing scientific articles and in general in the subject
In the diploma work, in addition to the special part, there is quite an interesting part with stories and stories about all sorts of horrors that occurred due to usap fakaps, a review of financial damage statistics that cause crappy interfaces and an overview of all sorts of ways to evaluate usability (carefully, albeit a little clumsily, collected and translated from a dozen sources).
You can see the content, and if you find something interesting, download the diploma itself .
Diploma Content
1. Introduction
1.1. The place of ergonomics in the science of quality
1.2. Conceptual apparatus
1.3. The economic effect of increasing usability
1.4. Economic effect
1.5. Safety Impact
1.6. Examples
1.6.1. Economic effect
1.6.2. Safety Impact
Flight crash 965
Disablement of a warship
Crash of remotely controlled aircraft Plane
crash 1972
Crash at Three Mail Island NPP
1.7. The need to evaluate user interface properties. Conclusion
2. Literature review. Existing solutions for evaluating software ergonomics
2.1. History of the development of interface ergonomics standards
GOST 28195-89 Software Quality Assessment. General provisions
ISO Standards
Conclusion
2.2. Existing criteria for general software certification
Feedback from certification laboratories
Conclusion
2.3. Existing methods for evaluating the ergonomics of interfaces
2.3.1. Assessments based on comparison and suitability of the environment
Standardization and suitability of the work environment
Benchmarking
Examples
2.3.2. Expert judgment
Heuristic assessment
2.3.3. Questioning users according to the results of interaction with the system
General requirements. Respondents
How many respondents are needed
Questionnaire according to
User loyalty (Net Promoter Score, NPS)
Software Usability Measurement Inventory, SUMI
System Usability Scale, SUS
Task complexity assessment based on a single question
2.3.4. Quantitative estimates based on experimental data
Number of errors
Average level of task performance
Unified metric (Single Usability Metric, SUM)
Examples
2.3.5. Formal Evaluation Methods
Information Search
Information Performance
KLM-GOMS Models
Tim Comber and John Maltby
System Complexity Assessment XAOS - Actions, Organizational elements, Summed entropy of RGB values
LOC-CC complexity measurement model
Examples
2.4. Quality management system as a basis for developing an ergonomic interface
Identification of a stakeholder
Interview with stakeholders Case
studies
3. Development of a quantitative assessment
3.1. Limitations of the use of formal numerical evaluation
3.2. Preliminary research. Participation of experts
3.2.1. Data source. Requirements for respondents
3.2.2. Definition of types of users
3.2.3. Determining the proportions of the number of users of each type
3.2.4. Estimation of frequency of use
3.2.5. Determining the value of time users, or their importance from a marketing point of view
3.2.6. Calculation of the coefficient of significance of the user from the received data, or assignment of the coefficient by an expert
3.2.7. Ranking tasks for each type of user
3.2.8. Highlighting the sequence of screens needed to solve the problem
3.3. Formal calculation
3.4. Classification
3.5. Example of calculation of the assessment.
User research.
Verification
4. Economic part
4.1. Building a calendar schedule
4.2. Construction of an algorithm for obtaining an assessment of human-computer interaction
4.3. Costing to obtain estimates of human-computer interaction
Conclusion
6. References
1.1. The place of ergonomics in the science of quality
1.2. Conceptual apparatus
1.3. The economic effect of increasing usability
1.4. Economic effect
1.5. Safety Impact
1.6. Examples
1.6.1. Economic effect
1.6.2. Safety Impact
Flight crash 965
Disablement of a warship
Crash of remotely controlled aircraft Plane
crash 1972
Crash at Three Mail Island NPP
1.7. The need to evaluate user interface properties. Conclusion
2. Literature review. Existing solutions for evaluating software ergonomics
2.1. History of the development of interface ergonomics standards
GOST 28195-89 Software Quality Assessment. General provisions
ISO Standards
Conclusion
2.2. Existing criteria for general software certification
Feedback from certification laboratories
Conclusion
2.3. Existing methods for evaluating the ergonomics of interfaces
2.3.1. Assessments based on comparison and suitability of the environment
Standardization and suitability of the work environment
Benchmarking
Examples
2.3.2. Expert judgment
Heuristic assessment
2.3.3. Questioning users according to the results of interaction with the system
General requirements. Respondents
How many respondents are needed
Questionnaire according to
User loyalty (Net Promoter Score, NPS)
Software Usability Measurement Inventory, SUMI
System Usability Scale, SUS
Task complexity assessment based on a single question
2.3.4. Quantitative estimates based on experimental data
Number of errors
Average level of task performance
Unified metric (Single Usability Metric, SUM)
Examples
2.3.5. Formal Evaluation Methods
Information Search
Information Performance
KLM-GOMS Models
Tim Comber and John Maltby
System Complexity Assessment XAOS - Actions, Organizational elements, Summed entropy of RGB values
LOC-CC complexity measurement model
Examples
2.4. Quality management system as a basis for developing an ergonomic interface
Identification of a stakeholder
Interview with stakeholders Case
studies
3. Development of a quantitative assessment
3.1. Limitations of the use of formal numerical evaluation
3.2. Preliminary research. Participation of experts
3.2.1. Data source. Requirements for respondents
3.2.2. Definition of types of users
3.2.3. Determining the proportions of the number of users of each type
3.2.4. Estimation of frequency of use
3.2.5. Determining the value of time users, or their importance from a marketing point of view
3.2.6. Calculation of the coefficient of significance of the user from the received data, or assignment of the coefficient by an expert
3.2.7. Ranking tasks for each type of user
3.2.8. Highlighting the sequence of screens needed to solve the problem
3.3. Formal calculation
3.4. Classification
3.5. Example of calculation of the assessment.
User research.
Verification
4. Economic part
4.1. Building a calendar schedule
4.2. Construction of an algorithm for obtaining an assessment of human-computer interaction
4.3. Costing to obtain estimates of human-computer interaction
Conclusion
6. References