Trivium of measurement theory

    In statistics and data analysis, it is understood that all values ​​are real numbers (vectors of real numbers) or can easily be reduced to them. But, for example, in nonparametric and non-numerical statistics, as well as in econometrics, it is very important on which scale the data are taken in order to understand what operations and methods with them are applicable.

    The problem with the definition of scales still lies in the fact that they are built by mathematicians, strictly formalizing, which makes it incomprehensible to the majority. For example, in the classic book of Pfanzagl, the scales are defined as follows:

    Where with. about. - a system with relationships, and h. With. about. - numeric with. about, the same ones that are used in algebra and the theory of normal forms of relational databases. If this is simple and understandable to you, you can stop reading further, for the rest I will tell you about scales simply and clearly, and the substantive importance of understanding this material.

    Scale of names (nominal scale).It is used to describe features that can only be compared for equivalence (equal - not equal). Such scales are measured, for example, musical tastes, parts of speech, political views. It is important to know that it is impossible to perform operations other than checking for coincidence in such scales, that is, rap fans are simply not equal to Justin Bieber fans, which of them is steeper to say on this scale is impossible. Numbers here can only be used to classify objects.

    Grouping and classification operations are also allowed on this scale; moreover, most classifications are created specifically for such scales.

    The scale of order, or rank scale (ordinal scale).This scale has all the properties of the naming scale, with the addition of an order relation. For example, we cannot say who is cooler than a fireman or taxi driver (scale of names), but we can say for sure that a major is cooler than a warrant officer (rank scale).

    For this scale, it is very important to understand that the numbers are used only in comparison operations, they can not be added or calculated average (general plus ordinary are not equal to two lieutenants). I will give one more example. Everyone likes jokes like: “After Vasya’s move from Russia to India, the average IQ of both countries increased,” meaning that the average IQ in Russia is greater than in India, and Vasya does not reach the average Russian. So, the concept of “average IQ” is incorrect, since IQ is calculated on a rank scale and initially designed so that the values ​​are distributed normally among the population, and in no case can it be argued that between IQ 141 and 142 the same difference as between IQ 120 and 121. Just joke correctly: "After Vasya moved from Russia to India, the average intelligence of both countries increased."

    The scale of differences, or interval scale (interval scale). Such scales measure dates, temperatures in Celsius and Fahrenheit. There is no natural starting point on such scales, although some people will argue for a long time that the countdown from Christmas or January 1, 1970 is very natural.

    Most Big Data presentations begin with a story about a pregnant schoolgirl. Testers have their own bike about airplanes. In short: an American plane crashed in Israel in the Dead Sea region due to the fact that its system was divided by zero as soon as the height of the aircraft above sea level became negative. I heard many versions of this story: either the plane flew upside down, then the stealth shoals went into the sea itself. This bike is very unlikely, if you understand that it makes no sense to divide by the value taken from the interval scale, which is the height above sea level. In fact, try to find a formula in which Fahrenheit temperature or latitude would be in the denominator.

    For the measurement results in such scales, the arithmetic mean can be considered, correlation and regression analyzes can be performed, but it is impossible to calculate the harmonic or geometric mean.

    Ratio ScaleFor such a scale, the presence of a reference point is natural. Sorry for the pragmatism, but everything that is measured by money falls on this scale. If the date is on the interval scale, then age will be on the relationship scale. It is sometimes said that this scale has all the properties of an interval, but a small nuance: if linear transformations (multiplication by a constant pole shift) are acceptable for an interval scale, then here only similarity transformations (multiplication by a constant). Most methods of statistical analysis imply that the values ​​will be on such a scale, so before feeding the analysis package with numbers, it is important to make sure that there is a natural origin, otherwise many statistical characteristics will be uninformative.

    These four scales are generally accepted today, however, when the theory of non-numeric statistics only appeared, many researchers introduced their classifications. Here, for example, is a page from Tyurin’s unpublished book:

    The approach of “inventing” your own scales can be productive in many projects. However, it is more important to check for operations with data and write the appropriate tests before the values ​​are received. And remember that just checking the units (which some programming languages ​​do) is not enough: time and age are measured in the same units.

    Also popular now: