Perhaps this is already covered by CopyAndPasteProgramming and CodeSmells? DeletionCandidate
I don't think so.
CopyAndPasteProgramming is the process, that often leads to
DuplicatedCode, but
- it also leads to BadlyFactoredCode?
- there are other sources of DuplicatedCode (e.g. BadlyFactoredCode?)
which means that these two are often related, but none is a special case of the other.
What constitutes
DuplicatedCode?
In
SmalltalkLanguage, I think duplicated code is more apparent than in other languages (and certainly easier to refactor away). In Java, though, I find that duplicated code isn't always obvious and sometimes impossible to refactor out.
For example, suppose you have the following code in one method:
if (this.o.isSomeCondition()) {
doSomething1();
}
and in another method you have:
if (this.o.isSomeCondition()) {
doSomething2();
}
Does the if test itself constitute duplicated code? If so, it could be refactored away through some form of
DoubleDispatch. If I were doing
BigDesignUpFront, I might decide that a
DoubleDispatch mechanism might be useful some time in the future, but since
YouArentGonnaNeedIt, I don't do that now. Does
RefactorMercilessly dictate that it is, in fact, duplicated code and should be immediately replaced?
Balance the smell against the cost of reducing it
in your environment. Each environment (language, linkers, etc) has its own threshold for how much duplication you can remove without it being too painful or too obfuscating. These language features lend themselves to removing duplication without excessive pain (and you get to choose what's too painful):
- Objects
- Dynamic Typing
- Meta-classes
- Reflection
- Macros
- Interpretation
- Closures / Lambda expressions
- Inheritance
- Templates / Parameterized Types
- AspectOrientedProgramming
For example, here's some duplicated (pseudo-) code:
def findAManager:
for each employee in employees:
if employee.isManager:
return employee
return nil
def findASupervisor:
for each employee in employees:
if employee.isSupervisor:
return employee
return nil
def findAWorkerBee:
for each employee in employees:
if employee.isWorkerBee:
return employee
return nil
Here the loop is duplicated, but removing the loop will obfuscate the code without making it any clearer what you're doing. If, however, you have something like closures or lambda expressions, you can abstract the loop out:
def findFirstThat (condition):
for each employee in employees:
if condition (employee):
return employee
return nil
def findAManager:
return findFirstThat (lambda worker: worker.isManager)
def findASupervisor:
return findFirstThat (lambda worker: worker.isSupervisor)
def findAWorkerBee:
return findFirstThat (lambda worker: worker.isWorkerBee)
There are other ways to abstract out the loop, some easier, and some harder. You just have to balance how hard the abstraction is against how painful the duplicated code is. Without lambda expressions/closures/continuations/whatever, the little bit of duplication in this little example isn't painful enough to do something about.
So: Balance smell against pain.
Is it sheer coincidence that the pseudocode above is almost executable PythonLanguage code? :) BurkhardKloss
As a (kind of) counter-example, you can do something similar in C++:
struct FindSupervisor {
bool operator() (const Employee &e) { return e.isSupervisor; }
} f;
Employees::iterator i = std::find_first_of(employees.begin(),
employees.end(), f);
if (i != employees.end()) {
Employee e = *i;
}
Repeat as necessary for the other types of worker.
My point being that, because C++ doesn't have real closures, you'll have to find some other way to refactor the expression (or leave it alone).
Just reinforcing the point: Balance smell against pain. You might consider this to be more painful than necessary.
Check out
OnceAndOnlyOnce. Also check
PatternOverdose and
PatternAbuse. Your first example on this page has this drift. --
BerndGoetz
Failure to factor out duplicated code is one way to get the dreaded
MonsterSubroutine.
I absolutely agree. But, when is it time to factor out duplicated code? Particularly because YouArentGonnaNeedIt. Clearly these forces need to be balanced, but how? ExtremeProgrammingMasters can do it through intuition and experience, but I am but a grasshopper.
[My (outsider's) understanding is that
YouArentGonnaNeedIt doesn't apply to refactorings -- If a particular refactoring brings you closer to
TheSimplestCode, then you need it. --
AnAspirant]
I think that the modification I describe below should set off the alarm - the code is absolutely identical. If there is a chance that the code will at some time differ for the different invocations of the function being replaced, then duplicating the code may make sense.
Besides the scenario described in
MonsterSubroutine, another arises from code like:
modify_it(&X);
...
modify_it(&Y);
... more code with modify_it() calls
Then a change requires that modify_it()'s functionality be replaced by:
prep_modify_it(&X, some_other_variable);
new_modify_it(&X);
for ( I = 23; I < 999; ++I) {
some_complicated_stuff(&X, I, third_variable + I);
}
and then instead of factoring this into a function, it is edited in place for every modify_it() call, leading to code bloat and difficult maintenance. --
RobertField
Consider instead...
modify_it(&X, some_other_variable, third_variable);
...
modify_it(&Y, some_other_variable, third_variable);
... more code with modify_it() calls [And change these too, in the obvious way.]
void modify_it(typeX *pX, type2 some_other_variable, type3 third_variable) {
prep_modify_it(pX, some_other_variable);
new_modify_it(pX);
for ( I = 23; I < 999; ++I) {
some_complicated_stuff(pX, I, third_variable + I);
}
}
--
JeffGrigg
Precisely. It is the failure to do something like this refactoring that gives rise to the monsters. -- RobertField
If we could replace the original with:
if (isSomeOtherCondition())
doSomething1();
so that the expression is at least factored out:
bool isSomeOtherCondition() {
return this.o.isSomeCondition();
}
then the repetition is less painful. (I think it helps to omit the unnecessary braces, too.) See also the
LawOfDemeter for why having hardwiring long dotted-paths like this.o.isSomeCondition() is intrinsically dubious.
Beg to differ slightly... this.o.isSomeCondition() simply calls a method on an instance variable, although I would likely use an accessor instead of going directly to the var. Perhaps the answer is related to "why can't 'o' decide to doSomething1() or doSomething2() itself?"
Referring back to:
def findAManager:
for each employee in employees:
if employee.isManager:
return employee
return nil
def findASupervisor:
for each employee in employees:
if employee.isSupervisor:
return employee
return nil
def findAWorkerBee:
for each employee in employees:
if employee.isWorkerBee:
return employee
return nil
This could also be written as
def findType(type)
for each employee in employees:
if employee.isType(type)
return employee
return nil
It requires a different way of managing types. I personally prefer this way because you don't have a method per way, plus it allows for iteration over a series of types in a list. For those who want the isType() methods, have them call the generic one.
A
RelationalWeenie version:
getEmployees(criteria) {
return(query(stdConn, "select * from employees where %1", criteria));
}
Note that this example is not limited to just
EmployeeTypes. However, the downside is that some may consider it a security risk because it is more open-ended (
SqlStringsAndSecurity). Then again, so is the original because one can simply get a list of all employees by calling it 3 times.
An interesting paper on redundant code that shows some statistically significant positive correlation between seemingly harmless duplicated code and real errors is at
http://www.stanford.edu/~engler/p401-xie.pdf. "We expect that redundancies, even when harmless, strongly correlate with hard errors. Our relatively uncontroversial hypothesis is that confused or incompetent programmers tend to make mistakes." --
StevenNewton
For automated support finding duplicated code, see
SimScan,
DupTective and
SameTool (or
CategoryDuplicationFindingTool)
The
EclipseIde will have in its release 3 a simple(?) refactoring tool for creating methods out of
DuplicatedCode. --
AndreasSchweikardt
See also
CopyAndPasteProgramming,
OnceAndOnlyOnce
CategoryAntiPattern