Automated Refactoring

On WhyDidYouRefactorThat it's observed that "Either a refactoring reduces entropy, or it does not. Refactorings which increase entropy are evil. Refactorings which leave entropy constant are neutral (and therefore not cost effective)."

This being a relatively straightforward heuristic to check, perhaps an automated RefactoringBrowser, maybe building on WebGrid to elicit sensible names, might be possible? Okay, so why not?

There are some forms of duplication that I'd like to have removed automatically - e.g. a method overridden with no change. I think removing such methods must be good and not bad or neutral. The usefulness of this is another matter. There are other duplications that can be removed automatically - it may get to be a bad idea when taken to extremes, and doesn't solve the most important issues of refactoring, but any help that does not cost me my time is good, even if it's limited. IvanMoore.

"a relatively straightforward heuristic to check"

Don't you mean metric? I'm assuming so below ...

What are you thinking of? Some function of the number of classes, methods, and lines of code in a program? I think you'd also need to take into account the distribution of methods and code through the system.

I agree that such a metric would be straightforward to calculate. But I don't see how such a metric would help you at all in deciding *what to do next* in an AutomatedRefactoring browser. Isn't that the hard part?

Also - would a "dumb" metric such as this (meaning it doesn't look at what any of the code is doing, it just counts things) really get you where you want? Or do you need to "look at the code" to figure this out? Do you need to measure how much "repetition" occurs in the code? Perhaps it depends on whether you want an absolute measure, or a relative measure.

Counting classes, methods, and lines of code is never going to tell you anything about entropy. Entropy is related to the purpose of the software. You don't get that by counting lines of code.


Perhaps EntropyIsComplexity??

Soon I'll be working with my teammate to make something more performant. It's not that it's chaotic or evil in any way. It performs just fine except when you use it hundreds of times concurrently. We want it to be faster. So we'll be refactoring it to reduce the number of instructions that occur in an otherwise well-designed situation.

So I agree that entropy is not represented well by quantities or metrics, but also feel that refactoring is not represented well by simply reducing entropy and reducing the amount of "evil" in my design.


If performance wasn't an issue would you rather have the code the way it is now?

Begin text copied from OnceAndOnlyOnce:

This OnceAndOnlyOnce refactoring sounds fairly mechanistic. Given a browser that *understands* the code (i.e. a refactoring browser) it should be possible to build an expert system that can fiddle with the code overnight to find ways to refactor it. To put it another way, why have many people refactoring when you can have a single engine doing it (i.e. do refactoring OnceAndOnlyOnce). -- DaveWhipp

Indeed it is possible to build a tool for mechanically removing duplicated methods and code - I built such a thing for the language Self (see The tool did as much refactoring of duplicated code as it could, without manual intervention to tell the tool what made sense. A lot of the time the results were surprisingly good; hierarchies constructed by this tool were often just what you'd expect, showing that basing hierarchies simply on removal of duplication leads to 'good' hierarchies. The results of trying to automatically refactor methods was (even) more contentious though. For example, although '+ 1' is duplicated a lot, do you really want to replace it with a method called 'incremented' (or something similar) just so that the concept of incrementing is expressed only once? -- IvanMoore

End copied text

I think the copied text belongs here and on OnceAndOnlyOnce -- ShaeErisson

There's a tool called CloneDR that is essentially a OnceAndOnlyOnce refactoring tool. See

I'm only new to refactoring really, but the automated refactoring that I am doing right now is to remove headers that have been included inside other headers.. it got out of control over the last few years of this project, and now modifying any of the main header files causes most of the project to get re-compiled....

I started a week ago, by manually moving #includes into the implementation files, but after a week of work I have not even gotten half way... then on the weekend I realized I could write a script to un-wind the #includes... the script took 15 mins to do the rest of the files. The next script will go through and remove all the un-needed #includes (by brute force (remove one, compile..))

The project architecture has similar problems, in similar 'simple' ways as mentioned above (method overridden with no change) so I am thinking of writing a script to use a ctags style dba to tell me about headers that are included for little real benefit

-- SvenDowideit

I have been automating refactorings for about 10 years.

Some refactorings are easy to automate and should be performed any time you are able to perform them. Very few refactorings are like this. One example (given above) is to delete a method that duplicates the method it overrides. But usually you have to break a method into pieces, and also the method it overrides, and then you can see that some of those methods are duplicates. Deleting the duplicates is the easy part; the hard part is figuring out how to break the method into pieces.

Most refactorings come in pairs. You can move a method up or down the inheritance hierarchy, you can inline a method or decompose it, you can write accessors for variables or inline them. Sometimes one is best, sometimes the other. There is no metric for entropy in software that always works. If you think this is easy then you either haven't thought about it enough, or you have made a great breakthrough and should publish it.

The RefactoringBrowser carries out a lot of refactorings automatically, but it only performs them when told by a programmer. We have a code critic called Smalllint that provides hundreds of suggestions for improvement. When I use it on a previously unchecked program, I often find that three-quarters of its suggestions are good. But if you just let it run free on a program without checking it, it would make a lot of changes that would not make your program better.

The most important thing about a program is how it communicates to humans. Until a refactoring tool can truly understand programs, I would not want them rewriting my program without checking with me first. It is OK for code optimizers to transform a program without asking me because I do not look at the assembly language output. But a refactoring tool should not rewrite the source except under the control of a programmer. -- RalphJohnson

I don't think anyone wants a completely automatic refactoring tool. But it would be great to hook Smalllint's advice up to the refactorer, such that if Smalllint told you that an instance variable was only used and assigned in one method, you could just say "fix it."

If you want automated help finding duplication in your source code, see DupTective.


View edit of April 8, 2003 or FindPage with title or text search