One way to improve programming skills is to ReadGreatPrograms
. Techniques such as SelfDocumentingCode
have been developed toward the goal of writing programs that can be easily read.
However, most of us have to read lots of code that does not meet the above standards. What are the best ways to make sense out of those huge, unstructured, maintained-by-dozens-of-people, internally-inconsistent, undocumented code bases that we must understand and absorb?
Build and Run the Program
Being able to run the program and observe its external behavior will be useful when studying its internals. Program documentation can be useful too, but documentation may not accurately describe the program's behavior.
Being able to build the program yourself will help you find out which external libraries are being used, what compiler and linker options are in effect, etc. And if you can build a debug version of the program, you can step through the code with the debugger (StudyTheSourceWithaDebugger
Also add or extend the logging in the program. Let it tell you what it is doing.
Find the High-Level Logic
Starting with the program's entry point (e.g. main() in C/C++/Java), find out how the program initializes itself, does its thing, and then terminates.
Most programs have a main loop--find it. (But if the program uses a library framework, the main loop may be in the framework rather than in the application code.)
Find the conditions that terminate the program. This includes abnormal exits in addition to the normal exit conditions.
Many event-driven and OO designs have no "main".
They still have to have some entry point though.
Not always. Software for the GEOS operating systems for the C64/C128 platform are structured as a single initialization routine (which is expected to return to the OS after it's done configuring its runtime environment), and a morass of callbacks invoked under various circumstances. In a sense, a GEOS application is nothing more than
an overlay for the host OS itself, which doubles as the system's sole application. Indeed, any true event-driven style of programming closely, if not exactly, resembles this. What is the "main method" of a class? The constructor rarely is the most important method. By extension, an event driven program's initializer is rarely its most important callback.
Draw Some Flowcharts
Yes, we all know flowcharts are a horrible design tool, but they can be useful when analyzing program flow. You can't keep all that information in your head, so draw some flowcharts or state diagrams while you are reading. Make sure you account for both sides of a two-way branch, and all branches of multi-way branches.
Examine Library Calls
If the program uses external libraries, examine the library calls and read the documentation for those calls. (This may be the only "real documentation" you have, so take advantage of it.)
Search for Key Words
Use your editor's Find feature (or grep) to search the entire source tree for occurrences of words associated with what you want to know. For example, if you want to figure out how/where the program opens files, search for "open" or "file". You may get zillions of hits, but that's not a bad thing--a focused reading of the code will result.
Often-useful words for C/C++ programs are "main", "abort", "exit", "catch", "throw", "fopen", "creat", "signal", "alarm", "socket", "fork", "select". Other words are useful in other languages.
Leverage the Power of Code Comprehension Tools
Some techniques available with a good text search tool, or better yet a tool that can analyze the source and find references and generate diagrams, allow the following questions to be answered, for object-oriented code:
- Who calls this method?
- Who implements this interface or subclasses this class?
- What are this class' superclasses?
- Where are instances of this class created, held, and passed as arguments or returned?
- What superclass methods does this class override?
- Where might methods of this class be called polymorphically, i.e. through a base class or interface?
References: "Comprehension and Visualisation of Object-Oriented Code for Inspections" http://www.cis.strath.ac.uk/research/efocs/abstracts.html#EFoCS-33-98
Print the code
Monitors can rarely beat the sheer textual capacity of an empty table on which a printout of the source code has been laid. You can see, at a glance, kilobytes of code. This aids tremendously in developing a "big picture" view of the code.
If a page contains functions unimportant to your analysis, you can replace it with another.
You can physically annotate the printout, which can also speed up comprehension. Circle important functions or highlight key variable names.
This will help you prove to yourself that you understand what the code is supposed to do, what it actually does, and that you understand its limitations.
If there are no UnitTest
s, then you should definitely create a sufficient set of UnitTest
s before making any changes to the code.
Comment the Code
Throw the code under into a personal CVS or RCS repository and mark it up with your comments. As you work out your knowledge of the code, the comments will change. This can be an important step if you have to ReFactor
the code later.
One of the best ways of reverse-engineering code you're so unfamiliar with that you cannot even guess to comment is to derive HoareTriple
s for the code under scrutiny, and to compute a procedure's WeakestPrecondition?
. Knowing the invariants of a procedure often gives valuable clues as to its intended purpose, thus letting you derive understanding from the code in a way that merely guessing from reading a number of poorly named procedure names simply cannot. Using this technique, you might even find bugs in the code, despite your lack of understanding thereof (I know this from experience!).
Clean Up the Code
An old writing trick for refamiliarizing yourself with text you wrote a long time ago but forgot, or for analyzing someone else's text, is to edit it as you go along. This is active
reading. Rewrite it in different ways, or a more pleasing way. You may have noticed on Wiki that while refactoring or reworking a page, you come to understand the material much deeper than just by reading it. Code is not much different from writing.
when reading code, reformat it as you go along. Realign spaces. Comment the Code, as it says above. Fix out of date comments (vis ThePalimpsestEffect
). Fix spelling. Make the code conform to the coding standards. Usually code is written somewhat hastily, so having someone come along later to make the code look professional is also another benefit.
if you do a lot of changes, run the UnitTest
s! Breaking things isn't necessarily something to be afraid of. By finding the hairier parts of the system and the dependent parts of the system (often those you won't expect), you come to grok the system.
A good article on this technique is "Make bad code good", at JavaWorld http://www.javaworld.com/javaworld/jw-03-2001/jw-0323-badcode.html
See also RefactoringForGrokking
I find I have to go through a lot of this reading the sample code on java.sun.com. Surprisingly (or maybe not), I usually make no progress grokking it until I delete all
of the comments. Of course, the next step is to rename, by MassiveSearchAndReplace
, all the variable names such as p
, etc. (in case you're curious, processor
). The next thing that helps me is to split up the giant methods into various methods (i.e., pulling looped and if'd code into their own methods, etc.).
A MassiveSearchAndReplace might miss cases or change the wrong things. A real renaming with a refactoring tool, if there's one available for the language, may be more effective.
Note that simple recompilation suffices as UnitTest
s for most of this... if I missed a tc
, or it replaced CompletedEvent?
, or if I forgot to pass a variable into an extracted method, the compiler will throw an appropriate error.
Reading big random programs reminds me very much of exploring mazes in Angband.
I start at some random position (maybe found by grep) and work my way around by traversing callees and callers until I've mapped out enough of the immediate vicinity to see what's going on and do my hack. If I explore enough directions for long enough, things start to connect up and give me a more "global" sense, but often I don't explore so widely.
This is a great analogy. In a similar vein, debugging is like completing an 'ascension kit' in Nethack. There's no certain path to tracking down your item/bug, and even once you've found/fixed it, there are always going to be others you could go for next, of greater or lesser importance/severity. Oh, and you could inadvertently overenchant your GDSM / blow up the system by enchanting / blindly 'fixing' before you take a second to make absolutely sure what you're doing.
A suggestion: Use Aspect Browser.
Aspect Browser is a tool for viewing cross-cutting code aspects in large amounts of code. I find it indispensable for navigating unfamiliar code, as a kind of super-grep. Read more about it here:
Another suggestion: CScope has saved my bacon more than once, apparently it supports Java now!. It'll grep the codebase to do stuff like find references to nearly any symbol, determine which functions call a certain function, etc.
Apply the patterns described in the ObjectOrientedReengineeringPatterns
book. Seconded! A great book that deserves to be on every maintenance programmer's (and that's all of us) shelf.
A number of source code comprehension tools are reviewed at http://www.grok2.com/code_comprehension.html
. This page is a couple of years old though, so some of the links may be broken. Personally a good code editor helps when reading code. Some thing like jedit or emacs from the free world.
See also ReadableCode