
The myth of the ideal number of lines in a method
There is a myth that if a function has more than n or less than m lines of code, then there are problems with the function in terms of design. For example, the author of the publication “Reflections on Design Principles” says that “the number of lines of a method ... is a necessary but not sufficient condition for good design.” In this article, I will explain the reasons why I consider the need for a function to be a certain size as a myth and give examples to prove it, but first, let's look at the reasons for the popularity of this myth.
If you take almost any algorithm from real life, it can easily be divided into a small number of operations, usually from 3 to 10, and these operations serve their unique micro-purposes. For example, to sit down at a chair standing near a table, you need 1) to see a chair; 2) go to the chair; 3) move the chair; 4) sit down in a chair. Such a description of the actions is quite understandable and, looking at each action, you can understand what is hidden behind it and what approximately steps need to be completed to implement it. This is an example of good design. If instead of the step “to see the chair” there would be several operations of straining the eye muscles, and instead of the “go to the chair” - a cycle of gradual progress with constant editing of the route, then such an algorithm would be difficult to understand; even remembering all the details of the algorithm is already problematic. This is an example of poor design without proper control of complexity. In this case, it is better to single out operations that serve one purpose - to reduce the distance between the chair and the person, in one function.
If we take other algorithms, for example, frying scrambled eggs or going to the cinema, again we can distinguish up to 10 smaller operations that serve DIFFERENT micro-goals. Agree that it’s quite difficult to come up with an example when operations are much more than 10, right? If you still get quite a few operations, then you can probably find a common goal for some, or maybe you are too fixated on error handling, which is actually not the BASIC part of the algorithm.
To come up with an algorithm from life in which at the top level of abstraction there are a large number of operations that cannot be combined into larger ones (as a result of which your main function of the algorithm is bloated into many lines of code) it is rather difficult, but possible. For example, the ritual dance of an Aboriginal tribe, which consists of the following actions: 1) sit down; 2) while breathing; 3) growl; 4) wave your hands; 5) get up; 6) jump ... and another 100 or 500 chaotic unsystematic actions.
Still want your function to be less than n lines of code? Then you have to break the method into 1) perform the first part of the ritual dance; 2) perform the second part - and so on, and this is an example of poor design, because it is not clear what this or that function is hiding. I will give an example of a similar situation, but already from the field of programming. Suppose there is a module that downloads pages from a certain site and parses them, in order to obtain information about the purchase, inside which there are protocols, lots, and much more. The main method of the module looks something like this:
1) Find out information about the supplier;
2) Find out information about the procurement lots;
3) Find out information about the procurement auction;
4) Find out information about procurement protocols;
5) Find out information about the last date the purchase status changed.
... and similar methods. In addition, each method downloads different pages and applies various parsing algorithms. Each operation is worthy to be allocated in a separate method and it will not work in any way to combine some of them into a separate function with a friendly name (do not offer the option “Find out information about the supplier and lots and auction”). The subject model and the number of pages on the source site can be increased indefinitely (as well as the number of front-end developers of the site, contributing more and more specificity to different pages of the site). Another example that refutes the myth is a family of cryptographic algorithms. The larger the number n-maximum of the allowable method size you would not name, the longer the cryptographic algorithm can be invented.
There is another interpretation of this myth - if the method has less than m lines of code (for example, 50), then something is wrong with it. How could such a point of view arise? Imagine a code that does not have a coherent architecture, in which the names of classes and methods either do not reflect the purpose of the entities, or are completely misleading. Perhaps the code was originally good, but then someone made a change to the function “Find out information about the last date the purchase status changed” and now it also saves information to the database and sends users notifications by e-mail, but the name of the function has not changed . Or someone made changes to the algorithm for finding the date of change, but made them not in this function, but in some other place, then the function would have to be renamed to “Find out information about the latest date of change of the PART of the purchase status” or to “Find out information about the event log” (now this is the PART of the operation to search for the date of change, and the method should be called accordingly), but alas, the function has not been renamed. As a result, there is no trust in the names of methods in such code, and to find out WHAT REALLY HAPPENS, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the code of the class were in one gigantic function, then it would be at least visible, as if at a glance, and deliberately false names of functions would not be confusing. WHAT REALLY HAPPENS HERE, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the code of the class were in one gigantic function, then it would be at least visible, as if at a glance, and deliberately false names of functions would not be confusing. WHAT REALLY HAPPENS HERE, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the code of the class were in one gigantic function, then it would be at least visible, as if at a glance, and deliberately false names of functions would not be confusing.
Now imagine a fictional programmer named Marcus. Marcus is not particularly zealous in studying design and works every day with the above-mentioned awkward code. Gradually, Marcus begins to notice that it’s easier to understand the “big code”, and the “finely divided” code begins to be associated with a headache. Then someone briefly tells him about the principle "do not produce superfluous entities." Which entity is superfluous, and which is not, Marcus cannot explain, but takes the principle into service. Then Marcus finds out about the KISS principle from somewhere, and decides that since “the fewer entities, the easier it is to figure it out”, then “the fewer entities, the more code corresponds to KISS”.
Here is an example articlewhose character is also called Marcus, and who wrote a class that knows how to bake any sort of bakery products according to any recipe on any oven with any source of fuel, and this class has only one method. As I understand it, he has two classes in the entire program - bread (which in fact can be a pie) and Manager (which at a meeting can say “I can do anything!” And not lie). Our Marcus (ours, which from this article) agrees and believes that this is BEST PRACTICE and following the KISS principle, and if you do not produce God objects with 1000 lines of code each, then you have something to learn from him.
Personally, I believe that there is no rule that a method must be larger than m lines, and that in 99.9% of cases it is very possible to write neat little functions that can tell you what’s going on inside them, what their contract is and what’s the purpose they serve. At the same time, it will not take a lot of time to find the necessary functionality and will not have to look through the entire code.
We already know how not to do it - blindly trust the number of lines in the method. A logical question arises - but how is it necessary? How to find out that you need to add something to a function, or remove something from it? The principle of “Low coupling & High cohesion” and two smells: “Shooting with a shot” and “Diverging modifications” will help us find out. If, when changing some type of functional, you need to correct a piece of this entity and a piece of another entity, then it means that the code started with “Shooting with fractions” and “Low cohesion” and it would be nice to merge these two entities into one. If, when changing a type of functional, that part of the entity always changes and this part of the entity never changes, then the code smells like “Divergent modification” and it might be worth splitting the entity into two. For clarification, we use a slightly modified example from the beginning of the article: if when changing the method of moving a robot approaching a chair, you constantly change the part of the algorithm relating to the choice of route and movement (depending on whether the robot moves on the ground, underground or above ground), then you need to select a separate function “approach to the table". As a result, where there was one entity, two appear. It is also necessary to give understandable names of entities, so that one name alone can understand what this entity does (and what does not).
PS All of the above is just my personal opinion. All characters are fictional.
Here the article ends and then comes the example of another long algorithm that is difficult to decompose, added at the request of readers. So, in order to encrypt some text according to one well-known cryptographic algorithm, you need:
1) Break the source text into blocks of 64 bits
2) For each such block
{
2.1) Rearrange the bits in the block in places according to a specific algorithm
2.2) Break the block into two blocks 32 bits long (hereinafter referred to as left and right)
2.3) Repeat 16 times:
{
2.3.1) Calculate Ki from the encryption key K and the iteration number using a specific algorithm.
2.3.2) Calculate block E of length 48 from the right block of length 32 according to a specific algorithm.
2.3.3) F = bit-wise sum of Ki and E
2.3.4) left block of length 32 = right block of length 32 bits at the last iteration.
2.3.5) a right block of length 32 = a left block of 32 at the last iteration, bitwise combined with F
2.3.6) Add a left block to the end of the encryption result
2.3.7) Add a right block to the end of the encryption result
}
}
I’ll add that the description of the Ki calculation algorithm on Wikipedia did not fit on my monitor, so I don’t think that in the program this will be one line of code. If you create the function “calculate Ki”, then this will be a bad design, because it is unclear what this function does and it is unclear what Ki is. However, if Ki has some common name, and Ki is some common abstraction, then the function “calculate Ki” has a right to exist. In order to decompose all the same, the developers of the algorithm of such abstractions themselves created, and not very successfully, their names are similar to “The first piece of the algorithm” and “A piece of the name of such and such”. Agree, terrible design. However, this design is generally accepted and is a subject area, so if you want to use it, then everything is fine. However, let's imagine that you need to slightly modify the algorithm in order to create your own unique algorithm that has properties similar to the original algorithm (this can be useful, because it is more difficult for an attacker to crack an “unknown to science” algorithm). In the new algorithm, "A piece of the name of such and such" is modified, and already is not a "Piece of the name of such and such". In this case, it is better not to break the algorithm into small functions, but leave it as it is, otherwise one day (a week after writing the code) you will get confused in this code as in a maze. The algorithm given here is DES. “The first piece” is “Initial permutation”, “A piece of the name of something like this” is “Feistel function”, E is “Extension function”. In a modified algorithm, all this is different. It is possible to partially decompose the modified DES for example, select the methods “Bit by bit add blocks”, “Perform permutation” (for the Block, from the Permutation Matrix), etc., but the “Encrypt block 64” method will still be indecently large. The selection of the method “Encrypt a block of length 32” is a bad idea, because you cannot decrypt it; encryption and decryption is applicable to blocks 64. I am sure that DES is not the longest of the algorithms, it is possible to find (or come up with) longer ones, such that they will have a large number of operations at the highest level of abstraction, the purposes and purpose of which differ from friend.
Reasons for the popularity of the myth
If you take almost any algorithm from real life, it can easily be divided into a small number of operations, usually from 3 to 10, and these operations serve their unique micro-purposes. For example, to sit down at a chair standing near a table, you need 1) to see a chair; 2) go to the chair; 3) move the chair; 4) sit down in a chair. Such a description of the actions is quite understandable and, looking at each action, you can understand what is hidden behind it and what approximately steps need to be completed to implement it. This is an example of good design. If instead of the step “to see the chair” there would be several operations of straining the eye muscles, and instead of the “go to the chair” - a cycle of gradual progress with constant editing of the route, then such an algorithm would be difficult to understand; even remembering all the details of the algorithm is already problematic. This is an example of poor design without proper control of complexity. In this case, it is better to single out operations that serve one purpose - to reduce the distance between the chair and the person, in one function.
If we take other algorithms, for example, frying scrambled eggs or going to the cinema, again we can distinguish up to 10 smaller operations that serve DIFFERENT micro-goals. Agree that it’s quite difficult to come up with an example when operations are much more than 10, right? If you still get quite a few operations, then you can probably find a common goal for some, or maybe you are too fixated on error handling, which is actually not the BASIC part of the algorithm.
Rebuttal myth
To come up with an algorithm from life in which at the top level of abstraction there are a large number of operations that cannot be combined into larger ones (as a result of which your main function of the algorithm is bloated into many lines of code) it is rather difficult, but possible. For example, the ritual dance of an Aboriginal tribe, which consists of the following actions: 1) sit down; 2) while breathing; 3) growl; 4) wave your hands; 5) get up; 6) jump ... and another 100 or 500 chaotic unsystematic actions.
What is the top level of abstraction
The first level of abstraction is the global purpose of the code, which, with a good design, coincides with the name of the main function of the algorithm, in this case let it “Call rain”. The first level of abstraction is degenerate and there is always one operation on it, therefore we exclude it from consideration and by the higher we mean the second. At the highest (second) level of abstraction, there are such operations as “Crouch” and “Pokahhahat”. At the same time, the “crouch” goal is to change the position of the body, and the “have a good breathe” - to make sounds imitating the sounds of a chicken. These goals are different, and if you look for a common goal between them, then in this context they have only one common goal - “Call rain”, so it’s not possible to separate them into a separate function that would be called inside “Call rain”. "Crouch" and "have a good breath" should be in the "cause rain" directly.
Still want your function to be less than n lines of code? Then you have to break the method into 1) perform the first part of the ritual dance; 2) perform the second part - and so on, and this is an example of poor design, because it is not clear what this or that function is hiding. I will give an example of a similar situation, but already from the field of programming. Suppose there is a module that downloads pages from a certain site and parses them, in order to obtain information about the purchase, inside which there are protocols, lots, and much more. The main method of the module looks something like this:
1) Find out information about the supplier;
2) Find out information about the procurement lots;
3) Find out information about the procurement auction;
4) Find out information about procurement protocols;
5) Find out information about the last date the purchase status changed.
... and similar methods. In addition, each method downloads different pages and applies various parsing algorithms. Each operation is worthy to be allocated in a separate method and it will not work in any way to combine some of them into a separate function with a friendly name (do not offer the option “Find out information about the supplier and lots and auction”). The subject model and the number of pages on the source site can be increased indefinitely (as well as the number of front-end developers of the site, contributing more and more specificity to different pages of the site). Another example that refutes the myth is a family of cryptographic algorithms. The larger the number n-maximum of the allowable method size you would not name, the longer the cryptographic algorithm can be invented.
The flip side of the myth
There is another interpretation of this myth - if the method has less than m lines of code (for example, 50), then something is wrong with it. How could such a point of view arise? Imagine a code that does not have a coherent architecture, in which the names of classes and methods either do not reflect the purpose of the entities, or are completely misleading. Perhaps the code was originally good, but then someone made a change to the function “Find out information about the last date the purchase status changed” and now it also saves information to the database and sends users notifications by e-mail, but the name of the function has not changed . Or someone made changes to the algorithm for finding the date of change, but made them not in this function, but in some other place, then the function would have to be renamed to “Find out information about the latest date of change of the PART of the purchase status” or to “Find out information about the event log” (now this is the PART of the operation to search for the date of change, and the method should be called accordingly), but alas, the function has not been renamed. As a result, there is no trust in the names of methods in such code, and to find out WHAT REALLY HAPPENS, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the code of the class were in one gigantic function, then it would be at least visible, as if at a glance, and deliberately false names of functions would not be confusing. WHAT REALLY HAPPENS HERE, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the code of the class were in one gigantic function, then it would be at least visible, as if at a glance, and deliberately false names of functions would not be confusing. WHAT REALLY HAPPENS HERE, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the code of the class were in one gigantic function, then it would be at least visible, as if at a glance, and deliberately false names of functions would not be confusing.
Now imagine a fictional programmer named Marcus. Marcus is not particularly zealous in studying design and works every day with the above-mentioned awkward code. Gradually, Marcus begins to notice that it’s easier to understand the “big code”, and the “finely divided” code begins to be associated with a headache. Then someone briefly tells him about the principle "do not produce superfluous entities." Which entity is superfluous, and which is not, Marcus cannot explain, but takes the principle into service. Then Marcus finds out about the KISS principle from somewhere, and decides that since “the fewer entities, the easier it is to figure it out”, then “the fewer entities, the more code corresponds to KISS”.
Here is an example articlewhose character is also called Marcus, and who wrote a class that knows how to bake any sort of bakery products according to any recipe on any oven with any source of fuel, and this class has only one method. As I understand it, he has two classes in the entire program - bread (which in fact can be a pie) and Manager (which at a meeting can say “I can do anything!” And not lie). Our Marcus (ours, which from this article) agrees and believes that this is BEST PRACTICE and following the KISS principle, and if you do not produce God objects with 1000 lines of code each, then you have something to learn from him.
Personally, I believe that there is no rule that a method must be larger than m lines, and that in 99.9% of cases it is very possible to write neat little functions that can tell you what’s going on inside them, what their contract is and what’s the purpose they serve. At the same time, it will not take a lot of time to find the necessary functionality and will not have to look through the entire code.
But what about it?
We already know how not to do it - blindly trust the number of lines in the method. A logical question arises - but how is it necessary? How to find out that you need to add something to a function, or remove something from it? The principle of “Low coupling & High cohesion” and two smells: “Shooting with a shot” and “Diverging modifications” will help us find out. If, when changing some type of functional, you need to correct a piece of this entity and a piece of another entity, then it means that the code started with “Shooting with fractions” and “Low cohesion” and it would be nice to merge these two entities into one. If, when changing a type of functional, that part of the entity always changes and this part of the entity never changes, then the code smells like “Divergent modification” and it might be worth splitting the entity into two. For clarification, we use a slightly modified example from the beginning of the article: if when changing the method of moving a robot approaching a chair, you constantly change the part of the algorithm relating to the choice of route and movement (depending on whether the robot moves on the ground, underground or above ground), then you need to select a separate function “approach to the table". As a result, where there was one entity, two appear. It is also necessary to give understandable names of entities, so that one name alone can understand what this entity does (and what does not).
PS All of the above is just my personal opinion. All characters are fictional.
Here the article ends and then comes the example of another long algorithm that is difficult to decompose, added at the request of readers. So, in order to encrypt some text according to one well-known cryptographic algorithm, you need:
1) Break the source text into blocks of 64 bits
2) For each such block
{
2.1) Rearrange the bits in the block in places according to a specific algorithm
2.2) Break the block into two blocks 32 bits long (hereinafter referred to as left and right)
2.3) Repeat 16 times:
{
2.3.1) Calculate Ki from the encryption key K and the iteration number using a specific algorithm.
2.3.2) Calculate block E of length 48 from the right block of length 32 according to a specific algorithm.
2.3.3) F = bit-wise sum of Ki and E
2.3.4) left block of length 32 = right block of length 32 bits at the last iteration.
2.3.5) a right block of length 32 = a left block of 32 at the last iteration, bitwise combined with F
2.3.6) Add a left block to the end of the encryption result
2.3.7) Add a right block to the end of the encryption result
}
}
I’ll add that the description of the Ki calculation algorithm on Wikipedia did not fit on my monitor, so I don’t think that in the program this will be one line of code. If you create the function “calculate Ki”, then this will be a bad design, because it is unclear what this function does and it is unclear what Ki is. However, if Ki has some common name, and Ki is some common abstraction, then the function “calculate Ki” has a right to exist. In order to decompose all the same, the developers of the algorithm of such abstractions themselves created, and not very successfully, their names are similar to “The first piece of the algorithm” and “A piece of the name of such and such”. Agree, terrible design. However, this design is generally accepted and is a subject area, so if you want to use it, then everything is fine. However, let's imagine that you need to slightly modify the algorithm in order to create your own unique algorithm that has properties similar to the original algorithm (this can be useful, because it is more difficult for an attacker to crack an “unknown to science” algorithm). In the new algorithm, "A piece of the name of such and such" is modified, and already is not a "Piece of the name of such and such". In this case, it is better not to break the algorithm into small functions, but leave it as it is, otherwise one day (a week after writing the code) you will get confused in this code as in a maze. The algorithm given here is DES. “The first piece” is “Initial permutation”, “A piece of the name of something like this” is “Feistel function”, E is “Extension function”. In a modified algorithm, all this is different. It is possible to partially decompose the modified DES for example, select the methods “Bit by bit add blocks”, “Perform permutation” (for the Block, from the Permutation Matrix), etc., but the “Encrypt block 64” method will still be indecently large. The selection of the method “Encrypt a block of length 32” is a bad idea, because you cannot decrypt it; encryption and decryption is applicable to blocks 64. I am sure that DES is not the longest of the algorithms, it is possible to find (or come up with) longer ones, such that they will have a large number of operations at the highest level of abstraction, the purposes and purpose of which differ from friend.