2 years ago
Thurs Aug 18, 2022 2:42am PST
Refactoring Large Codebases
Over my career I've refactored a handful of gigantic codebases and I have a few tips.

First: Version control _everything_. I mean, remove `.gitignore`, and add everything to git index before you start refactoring. The worst that can happen is when you accidentally make a change, everything breaks, and you have no way to recover to a working state.

Second: If you don't already have them, write compatibility tests for all public interfaces. It is easy to fall into trap of refactoring code and running a simple program to ensure that it works, but if it is a large application, chances are that your simple program does not cover all the edge cases.

Third: Use code coverage and eliminate all dead code paths before you start refactoring. When undergoing a refactor, chances are that you don't want to port everything: backwards compatibility, patches, etc. Code coverage reports allow you to understand exactly what is and isn't used by your tests on your target platforms.

Fourth: Make a list. I start refactors by scanning the codebase without making any changes and just writing a list of things that I would like to change. Planning allows to better understand the scope and feel like you are progressing in what may be a long endeavor.

Five: Make one change at a time. That's where the list comes into play. It is easy to start just going file by file and making changes to everything, but you are not setting up yourself for success, esp. if there are thousands of files. Doing 1 problem at a time allows you to be focused and efficient solving one problem.

Six: Document breaking changes and design decisions as you refactor. Whatever you document before the guiding principles for yourself and serve as a memo for future maintainers of the codebase.

These are the things/ideas I follow when refactoring large codebases. What about you?

comments:
add comment
loading comments...