
Enum-switch antipattern
Recently, I often see one interesting pattern in the code. It consists in creating an enum to describe a small set of objects, and then in different places in the code the values from the enumeration are processed using the switch statement.
What does the implementation of this template look like and what is it dangerous? Let's figure it out.
Suppose a team is developing a text editor and is going to implement support for several programming languages in it. Of course, not everyone, because there will not be enough resources for this, and there will not be much sense in this.
An enumeration is created to store a list of supported languages
For the editor to work in different places, you need to get some parameters that depend on a specific language. For this, these functions are created:
The compiler forces an exception to be thrown at the end, which needs to provide a guaranteed result for each function, and the C # compiler cannot control the completeness of the coverage of multiple values with the switch statement. And if the default icon for any new programming language can be thought up in advance, then it is impossible to determine in advance whether an unknown language will be case-sensitive and what extension its source will have, so an exception is thrown.
As a result, a fairly simple picture first appears. All supported languages are collected in one place. Where something depends on the language, switch is inserted. It is very easy for one developer to implement support for 2-3 languages at the same time. But later, with the support and development of a program based on this template, there will be serious problems.
The fact is that this approach creates god objects (god objects). Enum itself and each switch play the role of god objects. Any change associated with one of the programming languages supported by the editor will require a change to be made to all god objects. When working on Java support, you can break code related to C # or TransactSQL.
It will not be possible to distribute languages between several developers, so that everyone implements support in our editor for one single separate language. Developers will have to make changes to the same files, and merging improvements can easily break working code.
The complexity of adding support for a new language will constantly grow, because the size of switch statements will increase, and it will be more and more difficult to control them. Such a program cannot be called high-quality, because good programs should become better, easier and cheaper in development and support over time.
Using the enum-switch approach, developers unite entities that are virtually unconnected in reality with hard links. There may not be anything in common between TransactSQL and Java except that someone wants to open them in a single text editor. But in the code, TransactSQL and Java turned out to be in the same type of enum.
This is a manifestation of the antipattern god-object.
However, the manifestation of other antipatterns can also be found in this pattern. The developers of the text editor do not participate in the development of programming languages, they only engage in the implementation of the logic of their own software product. Therefore, for the editor, the features of languages are external data that he must be able to process. Here, this data is part of the code. That is, it turned out a kind of hardcoding. If Java comes out, in which the source files will have an extension of one letter J, then you will have to redo the editor and check if the other languages have broken.
So, the parameters of individual instances of the set described in the program are data that should be maximally separated from the code that implements the behavior of the program.
The switch statement often defines the relationship between entities from different sets. In our example, this is the connection between the programming language and the icon. However, the relationship between entities is also an entity, and it must be handled like all other data, for example, stored in a table column. If it is not possible to use external storage, then at least write the link to the Dictionary.
However, the switch statement has another unpleasant side effect. It not only defines the connection between objects, but is itself a connection. To make it clear what we are talking about, consider an example:
The SQL sql.svg icon is mapped to two SQL dialects. Now the language not only has an icon, but also has an implicit property, which means that the TransactSQL and PL-SQL languages must have the same icons. A developer who wants to change the icon for PL-SQL will decide whether he should change the icon for TransactSQL. In most cases, this is undesirable.
Finally, the antipattern enum-switch contributes to the manifestation of an error such as “This value is not provided from enum”, because it is difficult to control the full coverage of all switch statements when adding a new value to enum.
What should be done to avoid using this template?
In any incomprehensible situation, start the interface. To not use enum, get the interface. The interface should return information about the properties of the object from the described set. Add to this interface the name that used to be stored in the constant in enum.
Assign specific objects that implement this interface to a separate provider class.
To store descriptions of objects, you can use any framework. You can hardcode parameters, you can take values from the database, from configuration files, download from an external resource, and beat the default configuration of the object. The implementation of the provider will not affect the operation of classes using the objects created by the provider.
Now, remove all functions containing switch, if you had them. You will no longer need them, because the code does not process specific objects, but their properties.
In the above example, after the text editor supports 10-15 different programming languages, adding another language will come down to listing the settings from the list of previously implemented ones. Indeed, although there are a lot of programming languages, most of the nuances that affect the editing of source codes are common.
Why, then, is there a type such as enum in most programming languages?
It is convenient to use it if you do this with some caution. First of all, enum can be applied where the number of objects is small. Each developer determines the permissible limit at his discretion. I would not combine more than 20 constants in enum.
The described set should consist of objects, the differences between which can be parameterized. For example, the days of the week differ from each other only by serial number, so they are well described through enum. But it is probably not worthwhile to list any weather phenomena in the enum, because they have very little in common.
The set of enumerated objects must be either fixed, in which no new values will appear, or internal, which is completely defined and used only within one program.
Typical examples of using enum:
What does the implementation of this template look like and what is it dangerous? Let's figure it out.
Task description
Suppose a team is developing a text editor and is going to implement support for several programming languages in it. Of course, not everyone, because there will not be enough resources for this, and there will not be much sense in this.
An enumeration is created to store a list of supported languages
enum Language
public enum Language
{
Java,
CSharp,
TSQL
}
For the editor to work in different places, you need to get some parameters that depend on a specific language. For this, these functions are created:
GetExtensions (Language lang)
List GetExtensions(Language lang)
{
switch (lang)
{
case Language.Java:
{
List result = new List();
result.Add("java");
return result;
}
case Language.CSharp:
{
List result = new List();
result.Add("cs");
return result;
}
case Language.TSQL:
{
List result = new List();
result.Add("sql");
return result;
}
default:
throw new InvalidOperationException("Язык " + lang + " не поддерживается");
}
}
IsCaseSensitive (Language lang)
bool IsCaseSensitive(Language lang)
{
switch (lang)
{
case Language.Java:
case Language.CSharp:
return true;
case Language.TSQL:
return false;
default:
throw new InvalidOperationException("Язык " + lang + " не поддерживается");
}
}
GetIconFile (Language lang)
string GetIconFile(Language lang)
{
switch (lang)
{
case Language.Java:
return "bean.svg";
case Language.CSharp:
return "cs.svg";
case Language.TSQL:
return "tsql.svg";
default:
throw new InvalidOperationException("Язык " + lang + " не поддерживается");
}
}
The compiler forces an exception to be thrown at the end, which needs to provide a guaranteed result for each function, and the C # compiler cannot control the completeness of the coverage of multiple values with the switch statement. And if the default icon for any new programming language can be thought up in advance, then it is impossible to determine in advance whether an unknown language will be case-sensitive and what extension its source will have, so an exception is thrown.
As a result, a fairly simple picture first appears. All supported languages are collected in one place. Where something depends on the language, switch is inserted. It is very easy for one developer to implement support for 2-3 languages at the same time. But later, with the support and development of a program based on this template, there will be serious problems.
Disadvantages of using enum-switch
The fact is that this approach creates god objects (god objects). Enum itself and each switch play the role of god objects. Any change associated with one of the programming languages supported by the editor will require a change to be made to all god objects. When working on Java support, you can break code related to C # or TransactSQL.
It will not be possible to distribute languages between several developers, so that everyone implements support in our editor for one single separate language. Developers will have to make changes to the same files, and merging improvements can easily break working code.
The complexity of adding support for a new language will constantly grow, because the size of switch statements will increase, and it will be more and more difficult to control them. Such a program cannot be called high-quality, because good programs should become better, easier and cheaper in development and support over time.
Using the enum-switch approach, developers unite entities that are virtually unconnected in reality with hard links. There may not be anything in common between TransactSQL and Java except that someone wants to open them in a single text editor. But in the code, TransactSQL and Java turned out to be in the same type of enum.
This is a manifestation of the antipattern god-object.
However, the manifestation of other antipatterns can also be found in this pattern. The developers of the text editor do not participate in the development of programming languages, they only engage in the implementation of the logic of their own software product. Therefore, for the editor, the features of languages are external data that he must be able to process. Here, this data is part of the code. That is, it turned out a kind of hardcoding. If Java comes out, in which the source files will have an extension of one letter J, then you will have to redo the editor and check if the other languages have broken.
So, the parameters of individual instances of the set described in the program are data that should be maximally separated from the code that implements the behavior of the program.
The switch statement often defines the relationship between entities from different sets. In our example, this is the connection between the programming language and the icon. However, the relationship between entities is also an entity, and it must be handled like all other data, for example, stored in a table column. If it is not possible to use external storage, then at least write the link to the Dictionary.
Dictionary
Dictionary icons = new Dictionary();
icons[Language.Java] = "bean.svg";
icons[Language.CSharp] = "cs.svg";
icons[Language.TSQL] = "tsql.svg";
However, the switch statement has another unpleasant side effect. It not only defines the connection between objects, but is itself a connection. To make it clear what we are talking about, consider an example:
switch (lang)
{
case Language.TSQL:
case Language.PLSQL:
return "sql.svg";
...
}
The SQL sql.svg icon is mapped to two SQL dialects. Now the language not only has an icon, but also has an implicit property, which means that the TransactSQL and PL-SQL languages must have the same icons. A developer who wants to change the icon for PL-SQL will decide whether he should change the icon for TransactSQL. In most cases, this is undesirable.
Finally, the antipattern enum-switch contributes to the manifestation of an error such as “This value is not provided from enum”, because it is difficult to control the full coverage of all switch statements when adding a new value to enum.
There is an exit
What should be done to avoid using this template?
Interface
interface Language
{
string GetName();
bool IsCaseSensitive();
string GetIconFile();
List GetExtensions();
}
Assign specific objects that implement this interface to a separate provider class.
Provider
class LanguageProvider
{
List GetSupportedLanguages() {
...
}
Language DetectLanguageByFile(string fileName) {
...
}
Language GetDefaultLanguage() {
....
}
}
To store descriptions of objects, you can use any framework. You can hardcode parameters, you can take values from the database, from configuration files, download from an external resource, and beat the default configuration of the object. The implementation of the provider will not affect the operation of classes using the objects created by the provider.
Now, remove all functions containing switch, if you had them. You will no longer need them, because the code does not process specific objects, but their properties.
In the above example, after the text editor supports 10-15 different programming languages, adding another language will come down to listing the settings from the list of previously implemented ones. Indeed, although there are a lot of programming languages, most of the nuances that affect the editing of source codes are common.
Why enum is needed
Why, then, is there a type such as enum in most programming languages?
It is convenient to use it if you do this with some caution. First of all, enum can be applied where the number of objects is small. Each developer determines the permissible limit at his discretion. I would not combine more than 20 constants in enum.
The described set should consist of objects, the differences between which can be parameterized. For example, the days of the week differ from each other only by serial number, so they are well described through enum. But it is probably not worthwhile to list any weather phenomena in the enum, because they have very little in common.
The set of enumerated objects must be either fixed, in which no new values will appear, or internal, which is completely defined and used only within one program.
Typical examples of using enum:
- enum Boolean {True, False}
- days of the week, month
- state machine