Randll August 22, 2012 at 15:25

The practice of refactoring in large projects

From the sandbox

Some time ago, I got into a game dev, where I ran into projects of 2 million lines of code written by dozens of programmers. At this scale of the code base, problems of a previously unknown nature arise. I want to tell you about one of them now.

So imagine the following situation. It just so happened that you need to refactor a very large piece of code, a whole subsystem. Lines, commercials, at 200K. Moreover, refactoring clearly looks very large, affecting the basic concepts on which your subsystem is built. In fact, you need to rewrite the entire architecture, preserving the business logic. This happens if, for example, you have done one project and you have a new one ahead, and you want to correct all the mistakes of the past in it. Suppose, according to the first estimates, refactoring takes about 2 months, no less. In the process of refactoring, everything should work, including it is impossible to prevent other programmers from adding new features and fixing bugs in the subsystem. Often such refactoring is so complicated that it is completely impossible to stale the old code into new, and it is also impossible to roll out the result in parts.

Examples from practice, both mine and my colleagues:

Redo all database work from pure JDBC on Hibernate.
Redo the service architecture from sending and receiving messages to a remote procedure call (RPC).
Completely rewrite the subsystem for translating XML files into runtime objects.

What to do? Which side to approach the problem? Below is a set of tips and practices that help us deal with this problem. First, more general words, and then specific techniques. In general, nothing supernatural, but it can help someone.

Preparing for refactoring

Break refactoring into pieces, if possible. If possible, then no problem. All other tips about what to do if it fails.
Try to choose a period when the activity of adding new features to the subsystem will be minimal. For example, it’s convenient to redo the backend, while all the team’s efforts are focused on the front end.
Read the code well, why is it needed at all. What architecture is the foundation of the subsystem, what are the standard approaches that were used there. Clearly define for yourself what the new concept is and what exactly you want to change. There must be a measurable final goal.
Define the area of code that refactoring will capture. If possible, isolate it in a separate module / in a separate directory. This will be very useful in the future.
Refactoring without tests is very dangerous. You must have tests. I don’t know how to live without them.
Run the tests with the calculation of coverage, this will give a lot of information for thought.
Repair broken tests that relate to the desired subsystem.
By analyzing information about the coverage of methods, you can find and remove unused code. Oddly enough, this often happens up to 10-15%. The more code you can remove, the less refactor. Profit!
By coverage, determine which parts of the code are not covered. It is necessary to add the missing tests. If unit tests are long and tedious to write, write at least high-level smke tests.
Try to bring coverage to 80-90% of meaningful code. Do not try to cover everything. Kill a lot of time with little profit. Cover at least the main execution path.

Refactoring

Wrap your subsystem with an interface. Translate all external code to use this interface. This will not only force the application of good programming practices, but also simplify refactoring.
Make sure your tests test the interface, not the implementation.
Make it possible at startup to indicate which implementation of this interface to use. This opportunity needs to be supported both in tests and in production.
Record the revision of the version control system on which the writing of the new implementation began. From now on, every commit to your old subsystem is your enemy.
Write a new implementation of the interface of your subsystem. Sometimes you can start from scratch, sometimes you can use your favorite method - copy & paste. It must be written in a separate module. Periodically run tests on a new implementation. Your goal is to make sure that all tests pass successfully on the new and old implementations.
No need to wait for you to write everything completely. Pour the code into the repository as often as possible, leaving the old implementation included. If you keep the code in the zagashik for a long time, you can deal with the problems that other people will refactor the modules that you use. Just renaming a method can cause you a lot of problems.
Do not forget to write tests specific to the new implementation, if any.
After you write everything, look at the SVN history in the folder with the old code to find out what has changed there during your refactoring. We need to transfer these changes to the new code. In theory, if you forgot to transfer something, tests should catch it.
After that, all tests should pass with both the old and the new subsystem.
Immediately after you are convinced of the stability of the new version of your module, switch tests and production to it. Block commits to the old subsystem. By all means, try to minimize the lifetime of two subsystems in parallel.
Wait a week or two, collect all the bugles, conceive it and boldly delete the old subsystem.

Additional tips

All new features created in parallel with refactoring should be covered by tests 100%. This is necessary so that when switching to a new implementation, tests fall and signal that the new implementation does not have enough code from the old one.
Any bug fix should be done according to the principle - first we write a test that will reproduce the problem and crash, then repair it. The reasons are the same.
If you use the system a la TeamCity, make a separate build for the time of refactoring, where all tests on the new subsystem will be chased. Automatic build makes your new, not yet used, code “official”. All the same policies and rules begin to apply to him as to everything else.
It often happens that you don’t know if you fixed everything you wanted in the old code for the new architecture. For example, you don’t know if your code is using a direct JDBC connection somewhere, instead of Hibernate. Or suddenly a message slipped somewhere, not an RPC call. To find such places, you need to come up with a way to make the old method inoperative. Those. break it in tests. For example, to break the message delivery system or slip a broken JDBC driver into the system. Practice shows that in this way at least 5 pieces of forgotten and not corrected places are usually found.
Talk with other programmers, keep them informed of your progress. If they know that you have a week left, they can sometimes move their tasks until the release of a new version of your subsystem. There will be no need to merge the changes.

Experience suggests that even scary and large subsystems can be refactored with relatively little blood. Your main assistants are tests and systematic.

Tags:

The practice of refactoring in large projects

Preparing for refactoring

Refactoring

Additional tips

Also popular now: