kotyrev November 13, 2010 at 18:09

How to adequately test development platforms and do without holivarov

Of course, the title of this post looks a bit rhetorical. Because:
1) There are no assessment methods and metrics that are recognized by all market players (which are in the automotive business, computer hardware or in sports). This is the main problem of ensuring the reliability and adequacy of test results. But it is successfully resolved by the industry as it grows up.

2) The problem of holivars is becoming more and more complicated. CMS are platforms, and platforms (even purely technical) are the most fundamentalist, conservative and religious concepts of human culture. Because behind every technological or spiritual platform are living people, its adherents. And people perceive the world a little differently, through the prism of the values of their platform (Orthodoxy-Protestantism, liberal patriots, iOS-Android, procedural programming-OOP, spaghetti code in templates-MVC, etc.).

Therefore, wherever an attempt is made to compare platforms, the holivar will automatically begin. Many people simply refuse to accept arguments that contradict the dogmas, values, and ideas of their platform. No reasonable researcher today will dare to publish a comparative test of the Bible and the Koran based on the opinions of students of the faculty of theology.

Nevertheless, it is possible to compare CMS correctly, it just is much more expensive and more complicated than giving a lab to several students. The question of reliability in the methodology, metrics, testing conditions, judges and recognition by most market professionals. Moreover, in the case of adequate research methods, their holivism is greatly reduced - even hard-core fanatics find it more difficult to argue with clear and proven facts.

After all that has been written hereand here , I can’t help but bring to the attention of the community my vision of the methods of comparison and selection of development platforms, which was supported by many market colleagues.

Ratings, tests or studies

Ratings are the popularity of various products on the market. We already have two or three ratings with completely different results. They talk about the popularity of products, but not about their qualities (although there is a correlation between quality and popularity).

Tests are a comparison of consumer properties of products. There is no reference to the popularity of products, although the most popular ones are always compared.

There are no CMS tests on the market. Each developer or customer of the site itself tests the CMS according to one technique that it understands and selects based on its own test results.

Integrated researchthe impact of different CMS on the economics of web development or the economy of website ownership. This is what developers and their customers really need. They would carry the most useful information. And they, unfortunately, are the most difficult to organize.

Entire or individual properties

To compare two or three of the best universal CMS based on their technical qualities and it is almost impossible to choose the best one from them . The leading products are therefore leading because they compensate for their individual shortcomings with their individual advantages and, in the sum of their qualities, are comparable to other leaders. And to compare leaders with outsiders is pointless.

It’s simpler and more realistically to narrow down the task by scope (the best CMS for a blog, for a store, for a quick site), or to divide it into tests of individual properties (productivity, development speed, development speed, ..)

Sympathies or metrics

This is a key moment of my scientific debate with Mr. Ovchinnikov. I (and not I alone) believe that it is impossible to evaluate the technical CHARACTERISTICS of products based on the OPINION of people about them. Otherwise, instead of comparing the CHARACTERISTICS of the products, we get a comparison of OPINIONS about the CHARACTERISTICS of the products.
Opinions are studied in opinion polls, but not in technical tests.

Take, for example, an assessment of the performance of athletes (the same comparative tests in essence).
If one runner ran a hundred meters in 9.9 seconds and the other in 9.8 seconds, then the opinions of people (even judges) do not affect the result.
In figure skating, metrics are less accurate, people give marks. But inaccuracy is reduced by a large number of judges, their affiliation to different countries, and most importantly - their professionalism. No one conducts a survey of the opinions and sympathies of retirees and housewives to choose a winning skater, although pensioners and housewives are the main consumers of the skaters ’product. It is also incorrect to question the student's opinion when testing CMS, justifying that students are consumers of CMS.

I am not saying that it is easy to influence people's opinions simply by selecting the right people and the right setting of tasks, the right test conditions. And the readings of devices with a clear statement of the test is more difficult to challenge.

CMS, as in various sports, has easily measurable properties, and there are quite abstract ones (for example, the design of the installer or the design of icons).

How to evaluate the measured quality of CMS? Like on the run.
Just measure them with instruments and metrics. Devices are IT trackers, stopwatch and video cameras. Metrics are time and other costs of completing a task, ceteris paribus.

How to evaluate the immeasurable qualities of a CMS? Like in figure skating.
Interview the opinions of several recognized and independent experts. Where to get experts is another question, about this below.

Convenience is measurable

It seems convenient to us that which is familiar and uncomfortable to the unusual. This greatly distorts the real idea of convenience. Pushkin's quill pen with ink probably seemed a convenient tool for working with text. And show him a laptop with Word - he would hardly recognize its advantages. But if Pushkin had spent time and effort mastering a laptop, the pen would no longer seem to him such a convenient tool.

I am absolutely sure that convenience is not an abstraction. And the convenience of CMS is not a relative, but an absolute (and, therefore, measurable) parameter, if you can leave out the personal preferences and experience of people.
Therefore, it is necessary to evaluate convenience not by a survey of opinions “is it convenient for you?”, But by measurements.

You can measure the convenience of CMS by the time it takes to master the CMS (the time taken to complete the tasks when you first met), by the time and number of actions (clicks) to complete individual use cases. The research methodology and user cases should be developed by usabilityists.

In the 90s, when we still did not know the words “usability”, “interfaces” and “CMS”, the subject “Scientific organization of labor” was taught at our university, where all these principles were formulated when designing the workplace of an accountant or a control panel for hoisting by crane.

Since then, nothing fundamentally new in the methods of scientific evaluation of the convenience of interfaces has appeared. Just appeared devices such as Eye-tracker, which allowed us to evaluate the convenience even more accurately than professional experts do, not to mention students. By the way, I'm talking about this experiencetold at the latest User Experience. But if there are no IT trackers, you can get by with a stopwatch for measurements and a video camera to record the progress of experiments.

Methodology

An adequate test of CMS leaders is one whose methodology and experts are agreed upon by several market players.
As I see it:

1. The test should have 1-2 popular open source systems and 2-3 popular commercial systems

2. Tested characteristics of CMS are tested separately.
In each test, the influence of external factors on the tested parameter is leveled. That is, the quality of server operation settings, etc. should not affect the measurement of convenience or speed. All other factors should be equal and adequate to the requirements of each tested platform.

3. Target audiences are divided into separate groups:

by roles (site developer, content manager). Developers can be divided into programmers and coders.
by the degree of familiarity with the system (beginner, experienced user). To evaluate separately the speed of development of the system and separately the quality of the routine use of CMS by an experienced user.
you can go even deeper - according to socio-demographic characteristics (humanities and techies, young and old, men and women).

4. Each testee performs some actions, the tester on their basis receives a MEASURABLE result. All tests are organized so that the measurements can not depend on the opinion of the tester or subject. Judges (experts) analyze the measurements to summarize.

5. The opinions of the tested are taken into account only when evaluating abstract characteristics, for example, the style of the interface design. There should be few unmeasured criteria, not more than 20-30% of all tests.
It is advisable to involve experts in these areas to evaluate abstract characteristics.

The economic effect of using the platform.

Any technological achievement is useless if it does not have any economic effect. Any quality of CMS is useless if it does not give a performance gain in the development or operation of the site.

That is, all improvements in the technical qualities of CMS (performance, convenience, price, ...) are manifested either in the cost of owning a site, or in the cost of developing a site. And only from an economic point of view, these technical qualities have value in the eyes of the business.

It is clear that the methodology for studying the economic effect of different technical characteristics is even more complicated than just technical characteristics. But it is realizable.

There are two main subjects to study the economic effect:
1. The total cost of ownership of the site depending on the CMS (the term TCO, Total Cost of Ownership is used in the world), which takes into account:

hosting costs
site support and development costs
expenses for additional sites (as well as mirrors, language versions, design templates, ...)
security costs
staff training and motivation costs
staff replacement costs
the cost of working time on daily work with the site
quality support and documentation
and much more

2) The total cost of website development , depending on the selected CMS, which are:

training costs for the developer and content manager
developer qualification requirements
the availability and cost of developers with the required qualifications
the complexity of performing typical development tasks
standardization of the code and the possibility of a painless change of the developer during the development of the project (without "delete everything and redo it all over again")
quality support and documentation
training costs for customer employees
and much more

Who are the judges?

Test results depend on who organized them. To believe the tests, they must be authoritative and neutral people who do not want to prove something to themselves or to any of the tested and who do not have an emotional connection with the test results.

This is the second important reason why the CMS test from the web studio running on this CMS will be incorrect, especially against the background of an opaque or inadequate testing methodology. As well as a test from any representative of the web development market (they all have prevailing CMS preferences and business relations with vendors). Testers will simply try to prove to themselves and the world the correctness of their choice and their preferences.

We have already seen how one affiliate leader of one CMS independently tested its performance, and an affiliate leader of another CMS made its independent CMS popularity rating. If we were a little cynical, one of the UMI partners would also conduct their independent research or make their own rating.

Independent researchers should also DEEPly understand the experimental methodology, and web development in general, and CMS in particular.
Where are such people and companies? Do we have them? Maybe they are on Habré?
We are ready to fully support their professional work, regardless of the results of their tests.

conclusions

Adequate tests will allow customers and developers to make a more informed choice of CMS (evaluating not only advertising promises and brand awareness), and CMS manufacturers to more clearly understand their strengths and weaknesses.

In the design of the article used paintings of the artist Nikolai Kopeikin .

Tags: