If one is doing procedural-style code, what are the rules of thumb you use to divide up functions? Some suggestions:
- OnceAndOnlyOnce - Refactoring repeating code into a sub-function. Use the RuleOfThree for small items and the rule of 2 for larger items.
- YagNi - Don't split up a function unless there is a good current reason. Communicating intent is one possible reason.
- Confusion - Dividing to make the parts more mentally digestible (see WeinbergTestForLongFunctions).
- The "Headline" technique is an alternative way to divide up code into chunks. See LongFunctionsDiscussion for an example.
- Testing - Sometimes smaller functions simplifies testing.
- Viewability - Functions can be hard to read if one has to scroll back and forth through multiple pages to understand them. But one has to weigh this against the trade-offs of finding functions, parameter and scope management, and larger function name-space.
- Commenting - Extract a function if doing so allows you to name the new function something that will clarify the code. In LongFunctions, these sections are often surrounded by some whitespace and preceded by a comment.
Long functions are not inherently bad. Long functions are not a problem if well-designed and simple. Large length is only a smell, not necessarily bad practice, but it is a smell worth taking a close look at.
That doesn't pass the platitude test. Nobody would ever say "Long functions are a problem even if well-designed and simple." Somebody following your guideline is always entitled to write functions of any length, telling themselves "But
this one is well-designed and simple".
True, but that criticism is true of all CodeSmell
s, yet they do exist. The difference between a smell and an outright bad practice is precisely that there is no absolute hard and fast rule; judgment must be used. -- AnonymousDonor
Meyer's platitude test is insightful, but platitudes are sometimes useful. When you state a platitude ("X", say) the information you're really communicating isn't that X is true, but that X is worth paying attention to. That can be valuable information even if X is both obvious and uncontroversial. -- GarethMcCaughan
My advice is to ignore the length of the function. Begin to refactor the code into an object oriented style by combining code and data declarations. This will improve the modularity of the code, reduce coupling, and increase cohesion. After a while, the long functions will just melt away. Don't refactor code just to reduce the length of a function. Refactor the code to improve its structure and let function length take care of itself.
[My advice is to pay attention to the length of the function. Try to imagine that you are seeing the function for the first time. Is there a sequence of statements that could be summarized with a single comment? If so, ExtractMethod
on those statements. Try to make the function as painfully obvious to a naive reader as possible. -- EricHodges
But one major lesson I have learned over the years is that what bothers person X may not bother person Y, and visa versa. I can only speak for myself and guess of others. If there is a clean visual pattern to the code (from my perspective) and it does not majorly violate OnceAndOnlyOnce
, I usually have little reason to complain about a longer named unit.
This was a long and heated discussion, but some people gathered empirical data . This is from CodeComplete
- Basili and Perricone found that routine size was inversely correlated with errors: as the size of routines increased (up to 200 lines of code), the number of errors per line of code decreased (Basili and Perricone 1984).
- Another study found that routine size was not correlated with errors, even though structural complexity and amount of data were correlated with errors (Shen et al. 1985).
- A 1986 study found that small routines (32 lines of code or fewer) were not correlated with lower cost or fault rate (Card, Church, and Agresti 1986; Card and Glass 1990). The evidence suggested that larger routines (65 lines of code or more) were cheaper to develop per line of code.
- An empirical study of 450 routines found that small routines (those with fewer than 143 source statements, including comments) had 23 percent more errors per line of code than larger routines but were 2.4 times less expensive to fix than larger routines (Selby and Basili 1991).
- Another study found that code needed to be changed least when routines averaged 100 to 150 lines of code (Lind and Vairavan 1989).
- A study at IBM found that the most error-prone routines were those that were larger than 500 lines of code. Beyond 500 lines, the error rate tended to be proportional to the size of the routine (Jones 1986a).
Very interesting. Some of the better code I have written has had long functions (100 - 200 lines) with say twenty local variables. The functions often did this, followed by that, followed by the third thing, wrapped in a loop over the whole thing. Since I used some global variables (I refuse to allocate 16k buffers on the stack), and these functions were one level away from main, ExtractMethod
wasn't that much work, and I did so whenever I saw the same code appearing twice.
See also LongFunctionsDiscussion
; also TryLikePages