Unit Testing and Python



    My name is Vadim, I am the leading developer in Mail.Ru Search. I will share our experience with unit testing. The article consists of three parts: in the first I will tell what we generally achieve with the help of unit testing; the second part describes the principles we follow; and in part three, you will learn how the principles mentioned are implemented in Python.

    Goals


    It is very important to understand why you are applying unit testing. Specific actions will depend on this. If you use unit tests incorrectly, or using them to do something other than what you wanted, then nothing good will come of it. Therefore, it is very important to understand in advance what goals you are pursuing.

    In our projects we pursue several goals.

    The first is a trivial regression : fix something in the code, run tests and find out that nothing is broken. Although, in fact, it is not as easy as it sounds.

    The second goal is to assess the impact of architecture.. If in the project you enter mandatory unit testing, or simply agree with the developers on the use of unit tests, this will immediately affect the writing style of the code. It is not possible to write functions on 300 lines with 50 local variables and 15 parameters if these functions are subjected to unit testing. In addition, thanks to these tests, the interfaces will become clearer and some problem areas will appear. After all, if the code is not so hot, then the test will be a curve, and it immediately catches the eye.

    The third goal is to make the code clearer.. Let's say you came to a new project and gave you 50 MB of source code. Perhaps you just can not figure them out. If there are no unit tests, then the only way to get acquainted with the work of the code, in addition to reading the source code, is the “spear method”. But if the system is rather complicated, then it may take a long time to get to the necessary pieces of code through the interface. And thanks to the unit tests, you can see how the code is executed from anywhere.

    The fourth goal is to simplify debugging.. For example, you found some class and want to debug it. If instead of unit tests there are only system tests, or there are no tests at all, then all that remains is to get to the right place through the interface. I had the opportunity to participate in a project, where in order to test some features, it was necessary to create a user for half an hour, charge him money, change his status, run some cron so that this status would be transferred somewhere else, then press something else in the interface, run some other cron ... Half an hour later, the bonus program finally appeared for this user. And if I had unit tests, then I could immediately get to the right place.

    Finally, the most important and very abstract goal that unites all previous ones is comfort.. When I have unit tests, I experience less stress when working with code, because I understand what is happening. I can take an unknown source, fix three lines, run tests and make sure that the code works as intended. And it's not even about the tests being green: they can be red, but exactly where I expect. That is, I understand how the code works.

    Principles


    If you understand your goals, you can understand what needs to be done to achieve them. And here the problems begin. The fact is that quite a lot of books and articles have been written on unit testing, but the theory is still very immature.

    If you have ever read articles on unit testing, tried to apply what was described and you did not succeed, then it is very likely that the reason lies in the imperfection of the theory. This happens all the time. I, like all developers, once thought that the problem was in me. And then I realized: it can not be that I was wrong so many times. And I decided that in unit testing it is necessary to proceed from my own considerations, to act more sensibly.

    The standard advice that you can find in all books and articles: "you need to test not the implementation, but the interface." After all, the implementation can change, but the interface can not. Let's test it so that the tests do not fall all the time for every reason. The advice seems to be good, and everything seems logical. But we know very well that in order to test something, you need to choose some test values. Usually, when testing a function, the so-called equivalence classes are distinguished: the set of values ​​for which the function behaves uniformly. Roughly speaking, test for each if. But to know which equivalence classes we have, implementation is necessary. You are not testing it, but you need it, you have to look into it in order to know which test values ​​to choose.

    Talk to any tester: he will tell you that with manual testing, he always imagines the implementation. He perfectly understands from experience where programmers usually make mistakes. The tester does not check everything, first entering 5, then 6, then 7. He checks 5, abc, –7, and the number by 100 characters, because he knows that the implementation with these values ​​may differ, and with 6 and 7 it is unlikely .

    So it is not clear how to follow the principle “test the interface, not the implementation”. You can not just take, close your eyes and write a test. Part of this problem is trying to solve TDD. The theory suggests introducing equivalence classes one at a time and writing tests for them. I read a lot of books and articles on this topic, but somehow it doesn’t stick. However, I agree with the thesis that tests should be written first. We call this principle “test first”. We do not have TDD, and in connection with the foregoing, tests are written not before creating the code, but in parallel with it.

    I definitely do not recommend writing tests in hindsight. After all, they affect the architecture, and if it is already settled, then it is too late to influence it - everything will have to be rewritten. In other words, code testability is a separate property that the code will have to endowhe himself will not be like this. Therefore, we try to write tests along with the code. Do not believe in stories like “let's write a project in three months, and then in a week we will cover everything with tests”, this will never happen.

    The most important thing to understand is that unit testing is not a way to check code, not a way to check its correctness. This is part of your architecture, your application design. When you work with unit tests, you change your habits. Tests that only verify the correctness are rather acceptance tests. It will be a mistake to think that you can later cover something with unit tests, or that the code will not need to be checked later.

    Python implementation


    We use the standard unittest library from the xUnit family. The story is this: there was the language SmallTalk, and in it the library SUnit. Everybody liked it, they started copying it. The library was imported into Java under the name Junit, from there into C ++ under the name CppUnit and into Ruby under the name RUnit (later renamed RSpec). Finally, from Java, the library “moved” to Python called unittest. Moreover, they imported it so literally that even CamelCase remained, although this does not correspond to PEP 8.

    About xUnit there is a wonderful book "xUnit Test Patterns". It tells how to work with frameworks of this family. The only drawback of the book is its size: it is huge, but about 2/3 of the content is a catalog of patterns. And the first third of the book is just wonderful, this is one of the best books on IT that I have met.

    A unit test is a common code that has some kind of standard architecture. All unit tests consist of three steps: setup, exercise and verify. You prepare the data, run the tests and see if everything has come to the right state.



    Setup


    The most difficult and interesting stage. Bringing the system to its original state, from which you want to test it, can be very difficult. And the state of the system can be arbitrarily complex.

    By the time your function was called, a lot of events could happen, a million objects could be created in memory. In all components related to your software, in the file system, database, caches, something is already located, and the function can work only in this environment. And if the environment is not prepared, then the actions of the function will be meaningless.

    Usually, everyone asserts that in no case can one use file systems, databases, or any other separate components, because this makes your test not modular, but integration. In my opinion, this is incorrect, because the integration test does the integration test. If you use some components not for testing, but simply for the system to work, there is nothing wrong with that. Your code interacts with a variety of computer components and operating systems. The only problem with using the file system or database is speed.

    Directly in the code we use dependency injection. It is possible to forward parameters to a function instead of the ones used by default. You can even forward links to libraries. Or you can slip a stub instead of a request so that the code from the tests does not access the network. You can store custom loggers in class attributes so that you do not write to disk and save time.

    For stubs, we use the usual unittest mock. There is also a patch function, which instead of honestly introducing dependencies, allows you to simply say: “in this package, that import is replaced by another one”. It’s convenient because you don’t need to send anything anywhere. True, then it is not clear who and what replaced, so use carefully.

    As for the file system, it is quite simple to fake it. There is a module io c io.StringIOandio.BytesIO. You can create file-like objects that do not actually access the disk. But if suddenly this is not enough for you, then there is an excellent tempfile module with context managers for temporary files, directories, named files, anything. Tempfile is a supermodule, if for some reason IO didn't fit you.

    C database is more complicated. There is a standard recommendation: “use not real, but fake base”. I do not know about you, but in my life I have not seen a single fake and sufficiently functional base. Every time I asked for advice on what exactly I should take under Python or Perl, they answered that no one knew anything ready, and offered to write something of their own. I have no idea how to write an emulator, for example, PostgreSQL. Another tip: "then take SQLite." But this will break the isolation, because SQLite works with the file system. In addition, if you use something like MySQL or PostgreSQL, then surely nothing will work in SQLite. If it seems to you that you are not using the specific capabilities of specific products, then you are most likely mistaken. Surely even for trivial things, such as working with dates,

    As a result, this base is usually used. The solution is not bad, only you need to show a certain degree of accuracy. Do not use a centralized database, because tests may break among themselves. Ideally, the base should rise by itself during tests and stop itself after testing.

    The situation is slightly worse when you are required to run a local database, which will be used. But the question is, how will the data get there? We have already said that there should be a certain initial state of the system, there should be some data in the database. Where they come from there is a difficult question.

    The most naive approach that I have come across is the use of a copy of a real base. A copy was regularly taken from it, from which sensitive data were deleted. The authors reasoned that real data is best suited for tests. Plus, writing tests for a copy of the real database is torment. You do not know what the data is there. You need to first find what you are going to test. If this information is not there, then what to do is incomprehensible. It ended up in that project they decided to write tests for the maintenance department account, which "will never change." Of course, after some time it changed.

    This is usually followed by a decision: “let's take a copy of a real database, copy it and no longer synchronize. Then it will be possible to tie up a specific object, look at what is happening there and write tests. ” The question immediately arises: what will happen when new tables are added to the database? Apparently, you have to manually enter fake data.

    But since we are still going to do this, let's immediately prepare a base cast by hand. This option is very similar to what Django usually calls fixtures: they make huge JSON, fill test cases for all occasions, send them to the base at the beginning of testing, and like that, everything will be fine. This approach also has a lot of flaws. Data piled up in a heap, it is not clear what kind of test applies. No one can understand whether the data has been deleted or not deleted. And there are also incompatible database states: for example, one test needs no users in the database, and another to be. These two states cannot be simultaneously stored in the same cast. In this case, one of the tests will have to modify the database. And since we still have to deal with this, the easiest way is to start with an empty database, so that each test will put the necessary data there, and at the end of testing cleared the base. The only drawback of this approach is the difficulty of creating data in each test. In one of the projects where I worked, to create a service, it was necessary to generate 8 entities in different tables: a service on a personal account, a personal account on a client, a client on a legal entity, a legal entity in a city, a client in a city, and so on. While you do not create all this by chain, you are not satisfied with foreign key, nothing works.

    For such situations, there are special libraries that greatly facilitate life. You can write auxiliary tools, usually called factories (do not confuse with the design pattern). For example, we used the factory_boy library, which is suitable for Django. This is a clone of the factory_girl library, which last year was renamed to factory_bot for reasons of political correctness. To write such library for your own framework costs nothing. It is based on a very important idea: you once create a factory for the objects you want to generate, establish connections for it, and then tell the user: “when you are created, take yourself another name and generate the group yourself with the help of the group factory”. And in the factory everything is exactly the same: the name is generated this way, the related entities are such and such.

    As a result, the code is only one last line: user = UserFactory(). The user has been created, and you can work with him, because under the hood he generated everything you need. If you want, you can customize something manually.

    To clean up data after testing, we use banal transactions. At the beginning of each test, BEGIN is done, the test does something with the base, and after the test a ROLLBACK is done. If transactions are needed in the test itself, for example, because it commits something extra to the database, it calls the method we called break_db, informs the framework that it broke the base, and the framework rolls it again. It turns out slowly, but since there are usually very few tests that need transactions, everything is in order.

    Exercise


    There is nothing to tell about this stage. The only thing that can go wrong here is an appeal to the outside, for example, to the Internet. For some time we struggled with this administratively: we said to programmers that we had to either use functions that go somewhere, or throw special flags, so that the functions would not do that. If the test refers to corporate etcd, this is not good. As a result, we came to the conclusion that everything is in vain: we constantly forget that some function calls a function that calls a function that goes to etcd. Therefore, as a result, the mocks of all calls were added to the setUp of the base class, that is, they blocked all calls where it was not allowed using stubs.

    It is easy to make stubs using patches, put the patches in a separate dictionary and give it access to all tests. By default, tests will not be able to go anywhere, and if you still need to open access for some, you can redirect it. Very comfortably. Jenkins will no longer send SMS to your customers at night :)

    Verify


    At this stage, we actively use samopisnye assertions, even single-line ones. If you test the existence of a file in a test, then instead of assert, I self.assertTrue(file_exists(f)) recommend writing assert not file exists. Related to this is holivar: do you keep using CamelCase in names like in unittest or follow PEP 8? I have no answer. If you follow PEP 8, then the test code will be porridge from CamelCase and snake_case. And if you use CamelCase, then it does not comply with PEP 8.

    And last. Suppose you have a code that is testing something, and many variants of data on which this code should be run. If you use py.test, you can run the same test with different inputs there. If you don't have py.test, you can use such a decorator. A table is passed to the decorator, and one test turns into several others, each of which tests one of the cases.

    Conclusion


    Do not trust articles and books unconditionally. If you think that they are wrong, it is quite possible that this is true.

    Feel free to use dependency tests. There is nothing wrong. If you raised memcached, because without it, your code does not function normally, that's okay. But it is better to do without it, if possible.

    Pay attention to the factories. This is a very interesting pattern.

    PS I invite to my author's Telegram-programming channel in Python - @pythonetc.

    Also popular now: