Pattern Mining Thread

This thread on pattern mining broke off the PatternDefinitionThread.

(Dick gives his lecture on WhereDoPatternsComeFrom)


Thanks for the lecture! I liked it quite well. You ended it with:


However, I'm left, again, both agreeing and skeptical. Being the aforementioned "someone", it was never my intention to cast doubt on the value of pattern mining. I guess I can appreciate the design of virtual memory systems since I've used a few, but haven't ever gotten very deep into the source code of any of them. I can also appreciate the artistry and design of the Space Needle in Seattle, and the Hoover Dam, even though I have no idea what went into their construction either.

My point was that there's always something "lost" in the process of resolving some set of forces into a particular implementation that's just not going to be visible in the resulting implementation. Consider a violin created by Stradivarious (sic?). How long did it take before anybody was able to really figure out (they believe) exactly how the Master did it? Every pattern that could be observed by even the most intricate observation and "pattern mining" of every instance ever created by Mr. Stradivarious only results in a "good violin" being reconstructed by applying those patterns. Something was lost in the implementation that isn't easily rendered visible (if at all) simply by mining for patterns in that implementation. There's no disputing the obvious fact that this guy Stradivarious definitely, without a doubt, had a set of patterns he applied rigorously and methodically whenever he practiced his art. Since he never chose to disclose them to anyone, they died with him. As "pattern miners", we're left with an incomplete picture of his pattern language that we can NEVER totally reconstruct.

I'm not suggesting for a moment that we shouldn't bother! On the contrary, we need to be mindful of the fact that nobody knows what went "into the walls" than they guy(s) who designed and built them in the first place. The best folks to generate pattern languages are the same folks who generated the implementations with them! I think patterns are about as proprietary as Newton's laws of physics - if it were left to "modern" legal beagles, I think half of the extant corpus of math and science would remain shrouded in secrecy under the belief that there's something "proprietary" about it. And science would be about where it was a hundred years ago.

No, I think pattern mining is good, if it's the best we as an industry can muster. But, encouraging designers to become aware of and document the patterns they use is much, much better!

-- DavidSchwartz

RPG writes: >And once that point is clearly seen, the step of >mining patterns from existing software seems obvious (to me). Perhaps >we have grown accustomed to the false security presented by >mathematics and academic experts.

>Thank you all for attending my Monday lecture.

<raises hand>

This all seemed reasonable and good and fine. Until we got to the "And once that point ....." step.

Code is code. Often written in a rush, often less than truly elegant, often butt-ugly (to be blunt). And thus, there are things in code that tend to obscure the patterns. It's like Alexander walking in my apartment when ship-date is near. Sure, he might spot the Window Place. But, more likely he'll note the transient patterns ("Chairs are for holding clothes," "Multiple places for dirty dishes," that sort of thing).

The problem with the literature analogy is that we are all steeped in language from an early age. We begin to speak at 2, write by 5, and go through life using words in a fairly standard fashion. And this is a *shared* language. When I read or listen, I immediately notice deviations from standard usage. And I have long years of interpreting what those deviations mean.

By contrast, programming is something we pick up much later and never become fully fluent in. Coding styles, the way we break things out into methods, the boundaries between our objects, ... All these things are highly idiosyncratic. In natural language, when you deviate from the norm, there is communication in your deviation (or so say the Russian Formalists, anyway). In programming? There is no norm and hence, less meaning in your deviation.

It seems to me that pattern-mining is very difficult stuff. Perhaps best left to the "experts" in the code to be mined whenever possible.


Andy (WilliamGrosso) (who, once, for a brief moment, saw the TECO implementation of Emacs)

Interesting. Does this mean that you believe in some radical alternative, like thinking about the problems of software in abstracto and then declaring the solution as a pattern? Should these patterns be written by someone with no real software experience (such as the folks at SEI? [foo, I shouldn't say things like that in public - pretend I didn't])? Or should it be the authors of the code, authors of code that has fared well and been maintainable and demonstrated other good qualities, such as QWAN? Wouldn't that, then, be like mining patterns? And (oh, just write it!) isn't that what I suggested?

Let me be clear about this. Because there is no shared literature of code, we encourage pattern writers to write up patterns they have observed in their own and related code. When we see the same pattern show up a number of times (or other developers report they have seen the pattern and its virtues), then we can conclude with more certainty that it represents a (good) pattern. We don't throw out patterns that have not yet been reported in several systems, and we label them as provisional or in-progress in the meantime.

Sure there are problems with mining real code - and it sure is hard to do - but isn't that better than the only other alternative which is to invent a literature by making up its principles? Not everyone is a pattern writer just as not all of us are poets. Big deal? No. But you can contribute by vouching for proposed patterns with your experience of their qualities in code you have already written.

Code is code and is messy, but isn't the built world? Don't you think Alexander saw his share of deteriorated and heavily lived-in buildings and was able to see the underlying patterns? Sure. he's brilliant, but he's no smarter than you or me (actually, I don't really care about you in this matter). So why can't we see through the code mess as well?

And, it is entirely possible to gather valuable lessons from nearly every story or essay written. You aren't forced to wait for Faulkner or Yeats, you can gather principles and patterns from, why, even me. Or you.

Further, I have seen lots and lots of Lisp code, and I can see variations in style from decade to decade, but I can also see the same underlying principles and patterns in that code regardless of how messy some coding details may be. And ... there is beautiful code out there - I have seen some of it.

So, let me summarize: We don't want to accept patterns easily that are merely thought up out of the blue - we want patterns that are known from experience to have worked and demonstrate certain qualities. It is hard to mine patterns and perhaps only experts should do it. But isn't that like any other human endeavor? And because it is hard do we really wish to lower the standard? And aren't we more like experts than sometimes we feel?

Where is your faith in humanity and the insight of common people? Let's work hard and be the experts ourselves!

-- RichardGabriel

Well, I wrote:

	Perhaps best left to the "experts" in the code to be mined

The key thing here is "in the code to be mined." What I was suggesting is that the people who wrote the original code (the code that is to be mined) and have written similar things in the past, are the people to mine that piece of code.

What I was suggesting is, in fact, exactly the sentiment expressed in your rhetorical question:

	Or should it be the authors of the code, authors of code that has fared well
	and been maintainable and demonstrated other good qualities, such as QWAN?
And, perhaps, slightly in opposition to your examples from literature (where the mining is easier, for the reasons I outlined). I am not convinced that mining other people's work is necessarily a fruitful enterprise. And, to a large extent, architecture is more like literature than software (for the purposes of this discussion). In that, once again, we're talking about something we've inhabited for a long time. I've lived in rooms all my life. There is, I grant Alexander this easily, a standard sort of grammar underlying rooms. And we induct it from experience.

Interesting example: TheHauntingofHillHouse? by ShirleyJackson?. Classic literature, where the truly evil nature of Hill House is initially revealed through bad architecture (see the book; it's wonderfully well-written. And demonstrates my point that all of us have a long-standing acquaintainceship with the stuff of "architectural patterns" quite nicely).

And maybe there is the same sort of thing underlying code. But the barriers to seeing it are much more formidable (because code is so artificial, and grafted so late onto our minds).

	Where is your faith in humanity and the insight of common people?
	Let's work hard and be the experts ourselves!
Certainly, I am the expert in my code. Nobody knows my designs better and nobody knows the patterns inherent in them better (curiously enough, the users tend to know the flaws much better than I do :-) ). And the few patterns I have tried to write come directly from my code, my experiences, my attempts at formulating what I like about some of them.



This is counter to my some of my experience mining patterns. Let me explain.

We want to gather patterns from experts, right? So let's presume that Andy is an expert on, say, Objective C programming. He claims to know the patterns in his code better than anyone else.

Now let's say that I'm a C++ programmer who has just learned Objective C syntax. I look at a beautiful program that Andy has just written. I see that he has this block that he passes in message after message, and it seems to have something to do with error handling.

I ask Andy about the architecture of his program, and he mentions the classes and techniques and mechanisms he uses. He may never mention the recurring use of that block, because it's so second nature to him. That's all the more true if he's a really seasoned Objective C programmer. And all the more true if you spend more time reading code than writing it. Information theory tells us that frequency of use reduces the amount of information in a construct - i.e., it reduces its significance.

What's more, let's say that Andy has never programmed in Algol 68, so he hasn't used call-by-name and doesn't know what a thunk is. If you look at thunks and blocks side-by-side, they're instances of a more general pattern. And maybe he hasn't see people overload operator() on a C++ class and pass an object of that class as a parameter to a function, as kind of a closure - yet another variant of the same pattern.

So even if Andy were able to see himself using blocks as an architectural construct, he might not be able to really "appreciate what was going on" or to get the big picture of how blocks fit into the Von-Neumann-object-computational-model. This is an extreme example of what Dick was talking about when he mentioned we need to look at code in several programs.

Of course, it gets even more interesting if you move into Lisp or even further into SASL and see analogous constructs in the way things get curried, a higher order variant of the same pattern.

It is the nature of patterns to recur. If we don't find all these uses, then we're not finding patterns; we're just finding Solutions to Problems in Contexts. The meaning and significance of such constructs, as they contribute to wholeness, is easily overlooked if you're just looking at block passing in an Objective C program. You start getting underneath the syntax and looking at the principles if you see it in several programs.

I had this experience back in 1994 when I stumbled across Tim Budd's tutorial at OOPSLA. His tutorial included the code:

	class ordered [T : ordered] of equality[T];
	 . . . .

class integer of ordered[integer]; function asString ()->string begin . . . . end function equals (arg : integer) -> boolean; begin . . . . end; end;

class string of ordered[string]; function asString ()->string; begin return self; end; function equals (arg : integer) -> boolean; begin . . . . end; end;

or something like that. Hmmm, that looked familiar. Independently, Barton and Nackman had been honing that style for some time in their own work, using code like this:

	template<class DerivedType?>
	class EquivalentCategory? {
		friend Boolean operator==(const DerivedType? &lhs,
					const DerivedType? &rhs) {
		return lhs.equivalentTo(rhs);
		friend Boolean operator!=(const DerivedType? &lhs,
					const DerivedType? &rhs) {
		return ! lhs.equivalentTo(rhs);

class Apple: public EquivalentCategory?<Apple> { public: Apple(int n): a(n) { } virtual Boolean equivalentTo( const Apple &an_apple) const { return a == an_apple.a; } private: int a; };

Hmmm, and that brought back memories of a really clever architecture I'd seen Lorraine Juhl do for a finite state machine several years earlier:

	class myFSM: public BaseFSM<myFSM,
		/*State=*/  char,  /*Stimulus=*/  char> {
		void x1(char);
		void x2(char);
		void init() {
		. . . .

template <class M, class State, class Stimulus> class BaseFSM { public: virtual void addState(State) = 0; virtual void addTransition(Stimulus, State, State, void (M::*)(Stimulus)) = 0; virtual void fire(Stimulus) = 0; };

template <class UserMachine?, class State, class Stimulus> class FSM: public UserMachine? { public: FSM() { init(); } virtual void addState(State); virtual void addTransition(Stimulus, State from, State to, void (UserMachine?::*)(Stimulus)); virtual void fire(Stimulus); private: State nstates,*states,currentState; Map<Stimulus, void(UserMachine?::*)(Stimulus)> *transitionMap; };

I would never have thought of this as a pattern had I not seen these three isolated cases (and as best I can tell from talking to the authors, these three were really independently "invented"). I've seen it only once in each of these languages (except for the many Barton & Nackman uses in C++). Had I seen it in any single language, I wouldn't have thought of it as a pattern, just a quirk. Had it recurred as the dominant programming construct in each of these languages, I wouldn't have thought of it as a pattern either, just as a construct that is "part of the language" taught in the 101 class (like the hypothetical example of Andy and the blocks in an Objective C program).

I don't know if language designers foresee such "arcane" uses, or if they emerge autonomously. The question is important. Most architectural constructs are tied to culture. The most fundamental cultural constructs - anthropologists call them "universal patterns" - can be traced to a single common source of invention. Such constructs are common in our programming languages, many of which trace back to the culture of either FORTRAN or Lisp, and from there back to mathematics.

So if language designers actually foresee such uses as given above, then the patterns are really no surprise - they're just part of the mythology. I haven't talked to Stroustrup or Budd about these (how about it, guys?), but I'd be willing to bet that when the designed inheritance and parameterized types, they hadn't envisioned this combination. That says something very funky about how patterns arise in our discipline, and it may call into question the role of experience.

Don't you know that's why I said in my message that one of the difficulties of pattern mining is to help people articulate what they don't know that they know?

So, Andy, what else might you know that you don't know that you know? :-)

-- JimCoplien

The subject of today's lecture: WhereDoPatternsComeFrom?

You really discussed the question "How do pattern authors come up with the patterns they describe?" Other people read this question as "How did this pattern get into my system?" or "How come this pattern appears in so many systems?" These are different questions, and so it is not surprising that the answers to one question are bass-ackward to another.

-- RalphJohnson

Andy Grosso wrote: "It seems to me that pattern-mining is very difficult stuff. Perhaps best left to the "experts" in the code to be mined whenever possible.

I agree. Which leads me to something which has bothered me for some time. I write code. I debug code. I document code. I test code. When I'm lucky, I design. Right now, I'm writing snippets of code for one system while trying to design another completely new system from scratch.

I read patterns. I try to read a lot of them. I've even used them on occasion. There are many, many out there just like me - programmers who have designed a few programs and systems, and we know pretty much what's "good" and "bad." Patterns are just what we want and need: a way to codify this knowledge.

But pattern mining is hard! Let's say I'm faced with a difficult problem, I weigh the forces, I iterate over a problem until I resolve the forces in code. What do I do then? How do I mine the patterns out of that? What do I do with any patterns I do find? Are there any pattern-mining patterns?

(On another subject...) Current thought seems to say that I can't call it a "pattern" until I've done it 3 times, or seen it done 3 times (for "new" patterns that is.) However, I don't have time to research a bunch of other systems or code to try to find 2 more instances of my "proto-pattern". I think it's a good idea to encourage that patterns be proven like this to some degree. I shouldn't be able to whip up any old "solution to a problem" and stick the word "pattern" on it just because it is in "pattern form" and expect people to take is seriously. But I would hate to see a good solution thrown out just because it doesn't cite 3 references. What are would-be pattern-writers supposed to do about this?

(wiki note - see ProtoPattern ProvenPractice WhenToUsePatternForm)

-- DavidHooker

Pattern mining IS hard. It's similar in intent and function to what knowledge engineers were doing when everyone thought AI was cool. The important stuff you elicit in pattern mining is the stuff that experts don't know that they know.

It's hard for even more simple reasons. If you believe that the job of a pattern miner is to collect expertise, then you need experts to mine it form. Experts are usually too busy doing real work to sit down with someone who wants to interview them for what they know.

Values get in the way. Some experts take security in what they know and are distrustful of those who would spread their knowledge around. Others are afraid their insights won't appropriately be applied or understood.

See CppReport 7(8), October, 1995, "Pattern Mining."

-- JimCoplien
See InformationAndKnowledge

View edit of June 10, 2005 or FindPage with title or text search