
Type-safe identifiers and phantom types
Quite often, a program working with a database uses integer values (for example,
Suppose our program works with several types of entities. For example, take widgets (class
In addition to the high probability of error, the use of "raw" types as identifiers significantly reduces the readability of the code. It is not very easy to understand code that contains many types, like . Using type synonyms:
will allow the programmer to write more expressive code by manipulating types like . But this approach will only solve the problem of readability. The compiler still does not know what we believe the value types and inconsistent.
What would a person do if he had a need to operate on paper with a multitude of abstract identifiers so as not to get confused in all these numbers? I think it’s quite a reasonable approach to add a type label to the identifiers - a prefix or suffix that means an identifiable entity. For example, K-12 could mean a computer for 12 jobs, and P-12 - the twelfth registered user.
Fortunately, in C ++ there is a mechanism that allows you to attach labels to types - templates. To solve our problem, it is enough for us to implement a class parameterized by the type and storing the identifier:
Apply the new class to our gadgets and widgets:
Thanks to the way we defined the class
Operations on identifiers of the same type will work correctly. Now the compiler knows more about our intentions; it will not let us load the gadget by widget identifier or put an identifier of the wrong type in the vector.
If we still need to compare identifiers of different types, or compare the identifier with a raw value, you can always call the method
It turns out that the trick we just cranked up with identifiers has been known in functional programming for quite some time. Parameterized types that do not use the type of parameter in determining called phantom types ( Phantom Types ).
For example, in Haskell, a similar technique can be implemented as follows:
Wow, just a couple lines of code! Now add the definitions of our models:
and check the desired behavior by creating instances of different types and trying to compare their identifiers:
Well, the compiler (more precisely, I used the ghci interpreter for experiments) refused to accept a comparison of identifiers of different types. This is just what you need.
This technique can be used to bind to the numerical values of currency labels, units of measure, and other information that may be useful to both the reader of the program and the compiler.
Just one small class can save us a lot of time that would have to be spent searching for errors. In addition, the use of this approach will not affect the performance and memory consumption of the program at runtime when compiling with optimization enabled. The Haskell version also does not incur additional overhead.
The disadvantage is the need to type (and read) a little more letters and possibly explain the idea to colleagues, but quite often the advantages of a more rigorous logic check by the compiler outweigh the disadvantages.
Phantom types are popular in applications requiring high reliability, where each additional check automatically performed by the compiler reduces the company's losses. In particular, they are actively used when programming on OCaml at Jane Street and in Standard Chartered bank products written in Haskell (as Don Stewart talked about on Google Tech Talk 2015 ).
One cannot but mention the powerful Boost.Units library , which allows performing type-safe operations on values of different types with automatic output type output.
long
) as identifiers for entities . But people tend to make mistakes, and the programmer can mistakenly use the identifier of one type of entity to address another. Such a problem can go unnoticed for a long time if entity identifiers intersect, and this happens quite often. Fortunately, in languages that allow you to manipulate types, which is C ++, there is a fairly simple solution to this problem.Formulation of the problem
Suppose our program works with several types of entities. For example, take widgets (class
Widget
) and gadgets (class Gadget
):class Widget {
public:
long id() const;
// ...
};
class Gadget {
public:
long id() const;
// ...
};
In addition to the high probability of error, the use of "raw" types as identifiers significantly reduces the readability of the code. It is not very easy to understand code that contains many types, like . Using type synonyms:
std::vector или
std::map
typedef long WidgetId;
typedef long GadgetId;
will allow the programmer to write more expressive code by manipulating types like . But this approach will only solve the problem of readability. The compiler still does not know what we believe the value types and inconsistent.
std::map
WidgetId
GadgetId
We tell the compiler our intentions
What would a person do if he had a need to operate on paper with a multitude of abstract identifiers so as not to get confused in all these numbers? I think it’s quite a reasonable approach to add a type label to the identifiers - a prefix or suffix that means an identifiable entity. For example, K-12 could mean a computer for 12 jobs, and P-12 - the twelfth registered user.
Fortunately, in C ++ there is a mechanism that allows you to attach labels to types - templates. To solve our problem, it is enough for us to implement a class parameterized by the type and storing the identifier:
template
class IdOf {
public:
typedef ModelType model_type;
typedef ReprType repr_type;
IdOf() : value_() {}
explicit IdOf(repr_type value) : value_(value) {}
repr_type value() const { return value_; }
bool operator==(const IdOf &rhs) const {
return value() == rhs.value();
}
bool operator!=(const IdOf &rhs) const {
return value() != rhs.value();
}
bool operator<(const IdOf &rhs) const {
return value() < rhs.value();
}
bool operator>(const IdOf &rhs) const {
return value() > rhs.value();
}
private:
repr_type value_;
};
Apply the new class to our gadgets and widgets:
class Gadget;
class Widget;
typedef IdOf GadgetId;
typedef IdOf WidgetId;
class Widget {
public:
WidgetId id() const;
// ...
};
class Gadget {
public:
GadgetId id() const;
// ...
};
Thanks to the way we defined the class
IdOf
, the following code containing logical errors will not compile:// This won't compile.
vector gadgetIds;
gadgetIds.push_back(WidgetId(5));
// This won't compile either.
if (someGadget.id() == someWidget.id()) {
doSomething();
}
Operations on identifiers of the same type will work correctly. Now the compiler knows more about our intentions; it will not let us load the gadget by widget identifier or put an identifier of the wrong type in the vector.
If we still need to compare identifiers of different types, or compare the identifier with a raw value, you can always call the method
value()
explicitly.Phantom types
It turns out that the trick we just cranked up with identifiers has been known in functional programming for quite some time. Parameterized types that do not use the type of parameter in determining called phantom types ( Phantom Types ).
For example, in Haskell, a similar technique can be implemented as follows:
newtype IdOf a = IdOf { idValue :: Int }
deriving (Ord, Eq, Show, Read)
Wow, just a couple lines of code! Now add the definitions of our models:
data Widget = Widget { widgetId :: IdOf Widget }
deriving (Show, Eq)
data Gadget = Gadget { gadgetId :: IdOf Gadget }
deriving (Show, Eq)
and check the desired behavior by creating instances of different types and trying to compare their identifiers:
Prelude> let g = Gadget (IdOf 5)
Prelude> let w = Widget (IdOf 5)
Prelude> widgetId w == gadgetId g
:1:15:
Couldn't match type `Gadget' with `Widget'
Expected type: IdOf Widget
Actual type: IdOf Gadget
In the return type of a call of `gadgetId'
In the second argument of `(==)', namely `gadgetId g'
In the expression: widgetId w == gadgetId g
Well, the compiler (more precisely, I used the ghci interpreter for experiments) refused to accept a comparison of identifiers of different types. This is just what you need.
This technique can be used to bind to the numerical values of currency labels, units of measure, and other information that may be useful to both the reader of the program and the compiler.
Summary
Just one small class can save us a lot of time that would have to be spent searching for errors. In addition, the use of this approach will not affect the performance and memory consumption of the program at runtime when compiling with optimization enabled. The Haskell version also does not incur additional overhead.
The disadvantage is the need to type (and read) a little more letters and possibly explain the idea to colleagues, but quite often the advantages of a more rigorous logic check by the compiler outweigh the disadvantages.
Phantom types are popular in applications requiring high reliability, where each additional check automatically performed by the compiler reduces the company's losses. In particular, they are actively used when programming on OCaml at Jane Street and in Standard Chartered bank products written in Haskell (as Don Stewart talked about on Google Tech Talk 2015 ).
One cannot but mention the powerful Boost.Units library , which allows performing type-safe operations on values of different types with automatic output type output.