Cyclomatic complexity is a measure of source code complexity that has been correlated to number of coding errors in several studies. It is calculated by producing a ControlFlowGraph
of the code, and then counting:
E = number of edges in the graph.
N = number of nodes in the graph.
P = number of nodes that are exit points
(last instruction, return, exit, etc.)
Cyclomatic complexity = E - N + P
The metric tries to capture the number of paths through the code, and thus the number of required test cases. It is widely used, but has been criticized for not capturing the additional complexity implied in nested control structures.
Adding "P" makes sense, because each return statement is in effect an arc to the end point of the method.
A set of Perl programs for analyzing cyclomatic complexity of Python code is available at http://journyx.com/curt/complexity.html
is a graphical means of evaluating the complexity of a function and also determining the completeness of coverage of tests for that function. This method is language independent and can be done manually with little difficulty.
You can get a postscript description of CyclomaticComplexityMetric
or you can get a PDF version of the same document from the vendor's site at http://web.archive.org/web/20011111204409/http://www.mccabe.com/assorted/nist_pub.htm
(I am not associated with the vendor, I only include the latter link because the PDF format is more generally accessible). -- WayneMack
The link to the PDF for "Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric" on the nist site is broke. http://www.mccabe.com/iq_research_nist.htm
has a good link (as of 8 Dec 2010).
What the ExtremeFolk?
might be interested to know is that this metric is based on the computation of how many separate UnitTest
s it takes to fully exercise a given body of code. I've used the commercial tool version of this, including graphics. It gives an interesting picture of your code, perhaps something visual to add to the CodeSmell
s repertoir. You can "see" the complexity quite elegantly. -- WaldenMathews
this metric is based on the computation of how many separate UnitTests it takes to fully exercise a given body of code.
Isn't the point of TestFirst
that no code gets written but to satisfy a failing test? In which case the metric is redundant, since the number of tests needed to fully exercise my body of code is exactly the number I have. It might be interesting to know if the CC tells me a different number, indicating some excess code that I need to refactor out, but wouldn't this information be easier to find with a profiler?
I'm not sure about the answer to this. CC finds distinct paths through your code, some of which you may create inadvertently while trying to make some test pass. I suspect the redundancy of doing TestFirst and using CC would provide some interesting cross checks against each other, which in quality circles would be A Good Thing. -- WaldenMathews (speaking as if MarthaStewart)
Seem like there are three cases:
- #tests(CC) > #tests
- need to refactor your code?
- #tests(CC) = #tests
- no problem
- #tests(CC) < #tests
- need to refactor your tests?
One example is a function with a loop. CCM analysis says two tests are required for this function, one to cover the fall through path with no looping and one to cover the path including the loop. Even if you are fairly rigorous with UnitTest
s I think it would be quite easy to miss the first case.
Maybe. I write tests first. I already have a test that required the "fall through" case, no loop, to satisfy it. Then another test comes along that requires some iteration, so I add the loop. Voila, two tests.
Otherwise, if I have a new test that requires a new body of code, and that body happens to be a loop, then I'd expect the test cases to cover whatever inputs call for 1 iteration, 2, 3... The CCM analysis seems to call for another test looking at the 0 iterations case (i.e. don't do the thing that requires the looping). In the situation I've described, what would that add? Remember, all my other tests pass too, and one of them must consider the conditions under which the code decides to do or not do the thing that needs the loop. Again. voila, two tests. They just don't happen to be in the same place. (And anyway, probably my test for the new loop
would talk about the zero iterations case.)
What have I missed that makes CCM useful here?
How can you be so sure?
I can't be. If it turns out that there are some more tests needed, then I'll put them in. If I don't notice the absence of the tests I don't have, what have I lost?
This is an interesting point, and it maybe highlights a difference in attitude between hard-core testers and ExtremeFolk? (at least this one, I can't speak for anyone else). There has been some "discussion" of similar issues on news:comp.object recently.
[This highlights an issue with XP that nags at me. It seems to emphasize "just barely good enough" over craftsmanship to me. I know that over-indulging yourself in making "perfect" code can be a really bad thing (especially with budget, time, etc constraints) but the XP redefinition of "good" as "passes the tests" strikes a bad chord with me. It's like prefering to make cheap plywood & glue furniture - you can sit on it (right now), so it's okay. Isn't there some better way of keeping yourself from getting caught up in circles of beautification, without restricting yourself to "barely works" code?]
This is the entire point of RefactorMercilessly
. You can't just do one or two of the XP practices, you have to do them all.
How does the CyclomaticComplexityMetric
work with highly polymorphic code (lots of small methods, hook patterns etc?
Good question. We dabbled with the McCabe tool a few years ago. Then, we were working with very "static" languages: C/C++, their claims seemed to make some kind of sense. I'd suspect that CC is only really well defined for procedural code. IIRC the basic complexity measurement is the number of closed regions in the graph who's nodes are the decision points (if/else, case, I think the condition on a loop) and arcs are sections of strait-ahead procedural code. So, that's going to work well for PascalLanguage, you could probably bully JavaLanguage into that mould, but where would you start with SmalltalkLanguage? -- KeithBraithwaite
ps, we gave up on the tool 'cause it was too hard to set up
I don't know SmalltalkLanguage
, so please forgive my blind stab in the dark. It seems to me that anywhere you have language with sequence and selection, the CCM can be applied, otherwise not. Further, is there a way to look at Smalltalk code after it's written and determine all UnitTest
s needed to exercise it? If so, then there must be a way to apply CCM. If not, I am left speechless. -- WaldenMathews
(I don't know anything about the CyclomaticComplexityMetric
other than what's on this page, so it's possible that the following doesn't apply as it stands. CCM experts should feel free to refactor...)
The trouble is that it can be very
non-trivial to work out what the possible sequences of control flow are, in a language that makes heavy use of functional values. In order to work out where the code can branch to, you need to know what values your variables can hold. Objects are kind of like functional values, here, since an object (notionally, at least) contains methods.
Also, in an OO language like Smalltalk, and pretty much any functional language, you don't necessarily know which objects/functions are going to exist when the program is run (vs when you calculate the metric)
Maybe that's very abstract. Here's a little example in Python.
So, you want to work out where control can go next in a call to foo
; that means you need to work out what kinds of object it can be passed, and what kinds of object can be returned from asking those for their parent
s. Not straightforward at all.
Even more painful are languages like SchemeLanguage
in which the main mechanism for doing looping is tail recursive function call...
And more painful yet if your Scheme evaluator understands non-determinism
Some of the selection logic is provided for you by modern languages, which complicates thinking about CCM, and makes it important to evaluate the goal of using CCM. For the Python example above, you may want to run several UnitTest
s passing several different types. In a different respect, from a maintainability aspect, the code snippet above has, I believe, a complexity of 1. The complexity of code is not necessarily the complexity of system, so a clear objective becomes critical. -- WaldenMathews
Is there a way to look at Smalltalk code after it's written and determine all UnitTest
s needed to exercise it?
Probably. Might be very hard though, too hard to be worth while. On the other hand, there
is a way to determine all the UnitTests need to exercise code in any language
before it's written.
"Too hard" is a relative valuation. The PrincipleOfBeneficentDifficulty
says if it's too easy, it's probably of low value. So, "too hard" could truly signify a waste of time, or it could signify an opportunity to develop a useful skill. In this context, I'm too ignorant to even hazard a guess. -- WaldenMathews
OK, let's say, too much effort (or time) to be worth while for most projects.
I am confused about this. Some programs are so complicated that in order to test every path
(not function point - I'll get to that later), you could quickly (combinatorially) overwhelm the resources of the universe, let alone the resources of your corporation.
Most people give up on this approach after a certain point of complexity, but at least attempt to test every function point in the system. This is very similar (if not equivalent) to the "test every line of code" idea.
However, even this can become difficult as function points emerge from the system. For a very good example of how emergent behaviour can create new features, look at SimulatingQuoteBlocks
Additionally, it quickly becomes molasses to write a test for everything
(approaching "logic function tables") in the system because the system is in flux. Maintaining logical equivalence between refactorings is not so important as at least
maintaining functional equivalence (preferably improving function). It is merely that logical equivalence guarantees functional equivalence.
Anyway, at a certain level of complexity, the best you can hope for is probabilistic testing. That is, attempt to maximize the gains with limited testing resources by testing the most "telling" parts of the system. I understand this is used by telecom companies.
Now, all that being said, I don't understand CCM or the effects of using it. I imagine it is good at what it does, but I wonder how it gets integrated into a working team. Does it slow you down like molasses? Do you relegate testing to a build farm on a 27 hour cycle? Am I getting it completely wrong? -- SunirShah
Sunir, are you making the proper mental distinction between system testing and UnitTesting? The combinatorics of a single code unit is supposed to be tractable, if the system is adequately modular. CCM is done at the unit level only. There's no metric for all paths through a system.
As far as I know, CCM is about as useful as the other CodeSmells for guiding judgments in refactoring. A module of excess complexity is high on the priority list for refactoring. This is particularly useful when you inherit a poorly structured system, and want to incrementally improve it. CCM can provide a prioritized list of modules, worst first. But it doesn't do anything that good powers of observation and good taste in design can't do. Think of it as a refinement of your gut feel about module complexity. It ain't no SilverBullet. -- WaldenMathews
Oh, one additional use just occurred to me. Since gut feels are sometimes hard to sell, CCM can provide a more objective way for a team to communicate about the essential complexity of what they are building, even if their gut feels don't totally agree. -- WM
Ah, I was failing to see that CCM is only at the unit level. Thanks for the clarification. -- ss
CCM may not be perfect, but it can be a useful measure. If you have a nice automated tool that gives it to you cheap, it might be useful to look at it - as an "objective" CodeSmell
s guideline. -- JeffGrigg
See also GilbMeasurabilityPrinciple
- CCM is a measure.
Compare to AbcMetric