Big Soup Of Classes

A quote from PeopleWhoDontGetOo

"OO seems to lack a grouping granularity between "class" and "application", making an OO app appear like "class soup" to me. Procedural has "task" modules, which seem pretty natural for me and relatively easy to find. I can navigate 200 task modules more easily than 1000 classes. (I would say the ratio is roughly between 1:2 and 1:10.) "

In other words, what is the large-scale structure of classes in OO code? Is there any consistency among methodologies? What are the mental steps one takes to navigate these? Suppose you were a contractor who walked into a shop with a fairly big OO application. How does one go about making changes and/or learning that application? For example, the manager says, "I want to add a 'full price' column to all reports and screens that currently have our discount price."

Do you mean something like packages, namespaces, or assemblies? Or something bigger?

It is not so much what you call them, but what is in them. Also, I am mostly interested in BusinessModeling, not so much GUI and infrastructure API's.

Between Classes and Libraries, one could PackageByFeature.

In other words, what is the large-scale structure of classes in OO code?

Some people use the term DesignPatterns.

I tend to do a large part of what GOF patterns do using relational techniques. It does not drag the code into having to deal with the NounModel on a structural level. GOF patterns are not very easy to find anyhow IMO, especially if named poorly and/or mixed in with many other patterns. The parts are spread all over, with no obvious boundaries.

What is the large-scale structure of classes in OO code?.... In ProceduralProgramming, usually there is some kind of interface module or table that dispatches the "tasks" based on user input or other "events".

Would it sound silly to put it this way:

What is the large-scale structure of an ant hive?.... In the military, usually there is some kind of commander or office that dispatches the "orders" based on threats or "attacks".

The concept of central command-and-control is not a prerequisite for understanding the organization of a system. Consider the possibility that in looking for that sort of structure it is possible to completely misunderstand how the system is actually working.


I don't necessarily expect a hierarchy; but, hopefully something "better" than a mere random-looking graph. A "segmented" graph would be an example, where there are fairly distinct clusters in the graph where the number of links between clusters is much smaller than within clusters. (The human brain is more or less organized this way.)

Using some kind of set theory for navigation is another approach. Unlike clustering above, sets may heavily overlap, but can be isolated using some kind of set query system. One can study specific aspects at a time.

I've seen relational designs that (in my opinion) had too many "thin tables", and were thus chaotic/random graphs. Thus, relational can be messy also. But at least I can usually find a way to clean them or re-represent them. (The "thin table" approach proposed by some vocal WikiZens *could* be grouped by entity usually, but existing tools don't readily provide such a view.)

Random-looking graphs are thus a smell.


OO seems to lack a grouping granularity between "class" and "application", making an OO app appear like "class soup" to me.

I don't understand what you mean. An object can represent a high-level concept or a low-level concept, just like a procedure can. When I want to group a bunch of objects together, I make a "module" object and put the other objects inside it. I can do this at any granularity I want.

When I want to find a particular object, I can start at the top and look at all the top-level modules, then choose one and look through it to find the sub-module that I want, and so on. Or I can search for the object by name, if I know part of the name. Or, if I'm looking at the object on the screen and I want to find out how it works, I can right-click on it and ask to see its code.

There are lots of ways to group things, and lots of ways to find things. Why is this any harder with objects than with procedures?


What are these "modules"? Are they grouped by entity, by task? What rules of thumb or descriptions do you use to describe them?

The problem here is inconsistency. I find procedural more consistent from practitioner to practitioner. I can walk into a procedural shop as a consultant, and find a pattern similar to that above. With OO it can be anything and takes months to grok, if ever. It is like dealing with a vastly different personality every time.

Answer: Subsystems and libraries. Just like with ProceduralTechnologies?.

The big difference I've seen is that OO lends itself to building "frameworks," while procedural approaches tend to be limited to libraries of procedural functions.

You can build frameworks with procedural languages, but it generally requires function pointers, which confuse many people.

You are going to start a fat fight if you suggest that OO is better at building frameworks without more evidence presented. This is probably not the place to resolve such claims anyhow.

I am not sure function pointers are the only alternative. Besides, what confuses one person may not confuse the other.

There are also other models useful for organizing code, like layers. You can do them using procedural or OO techniques, but I've found that OO generally gives you more powerful and flexible tools to accomplish such goals -- like interfaces, for example. -- JeffGrigg

I think nested "layers of abstraction" tends to run out of steam at a point. Abstraction is relative. What is low-level to one person is high-level to another, and visa versa.

...what is the large-scale structure of classes in OO code?

This reminds me of astronomy. We perceive stars and constellations first in the evolution of astronomy, then galaxies, clusters and only very recently "walls" and "spiderwebs". These latter were never obvious to the naked eye.

OO can operate conceptually at any level. The higher levels don't involve just "classes", they involve aggregations like packages, frameworks, subsystems, servers, tiers, etc. If you can only see (with your mind's eye) the classes, your perception of structure has yet to evolve. -- TomRossen

But this is a simple "distance" taxonomy. Is there something similar for OO designs?

   Galaxy Threads
     Galaxy Clusters
         Star Clusters
           Stars + Planets + Dust + Gas

(Actually, it is more complicated than this, but this is not an Astronomy course.)

Something similar to distance as a metric? How about level of abstraction? Not that this was my point in making the analogy. In fact size(which is what I think you intended here, rather than distance) is relevant to software: the higher level structures tend to correspond to larger and larger groupings of stars - I mean classes. -- tr

I think the nested (hierarchical) abstraction does not work past a certain point. It helps hide the hardware perhaps, but tend to believe that abstraction is relative, not hierarchical for the business software domain. An accountant's viewpoint or taxonomy of say car parts may greatly differ from an engineer's.

A hierarchy can be a network seen from the perspective of a single node, which becomes the root. We use relational data modelling to define the network (all information re car parts), and create particular views to satisfy accountants or engineers. A view itself may be structured as a network (e.g., a website) that can in turn be accessed "hierarchically" from well-known entry nodes (e.g., a homepage). I never meant to suggest that hierarchy was the primary organizational principle, in life or in OO. -- tr

Can't this be done without converting stuff to objects?

{What's described above is sometimes called "Aggregation" or "Composition", which is essentially a hierarchical layout. However, it tends not to scale far except in limited niches. Related: LimitsOfHierarchies}

I don't think anyone's answered the original question yet.

In other words, what is the large-scale structure of classes in OO code? Is there any consistency among methodologies? What are the mental steps one takes to navigate these?

There are two large-scale structures between classes and applications: packages (namespaces in some languages) and layers. An application may consist of multiple layers, each of which may consist of multiple packages/namespaces. Packages generally correlate to directories, just as classes generally correlate to files. Layers don't have an analogy in the file system.

Layers correspond to basic types of functionality. Examples: database layer; UI layer; business layer. OO programs generally have between one and five layers, depending on complexity. There's only a handful of different kinds of layers that most programs use, and the three I mentioned are the most common. You said you were interested in business modeling, which would correspond to the business layer. Sometimes it's called the domain layer.

Packages correspond to finer-grained chunks of functionality, but are still larger-grained than classes. The number of packages varies according to the size of the program and the tastes of the designers. I prefer packages to have no more than a few dozen classes. Examples (in the database layer): Oracle RDBMS package, accounting database package, HR database package.

Good designs try to minimize relationships between layers, and between packages.


Okay, I am familiar with "infrastructure" kinds of packages and GUI layers, but what about business modeling itself? Perhaps OO is sufficiently mature at infrastructure things like database and GUI API's/layers/wrappers, but there seems to be a big gap in consistency and documentation WRT business modeling itself.

I've never worked on a corporate business modelling project. So I can't speak from experience about really big business modelling. But, as you probably know, nouns in the business are modelled with classes, along with a smattering of supporting classes. When the number of classes gets too large, they're grouped into packages, with an eye towards minimizing relationships between packages. I can't think of any examples offhand because it's been several years since I worked on a project that needed more than one package in the business layer. However, MartinFowler's AnalysisPatterns has more to say about this subject. I would also say that business modelling is OO's strong point, but again, my experience in this area is a bit limited. --JimLittle

EricEvans is producing a book on how to structure the BigSoupOfClasses. I believe is where he is pre-publishing his book. --AlistairCockburn

But I don't see any consistent "big picture" patterns in there, just a lot of UML diagrams. I don't see the kind of consistency that I see in most (decent) procedural/relational arrangements. Is UML itself the "big picture" of the OO world?

Regarding "looking for nouns", these often become database entities so that one does not need a module per noun or any other large code artifact for nouns (entities). Relational allows one to decouple code structure from NounModel more or less. If a task is related to a noun by chance, then document that fact in a comment. I see no reason to make it formal so far, for many times there is no one-to-one relationship between verbs/actions/tasks and nouns.

Interesting question. I can't imagine what sort of "big picture" patterns you see in ER modeling - all there is is a Big Soup Of Entities - - there are no bigger organizing principles. Ditto classes (in this sense classes are isomorphic to entities). Here's what I take away from the story so far: in the world at large, there are tens of thousands of objects and no organizing principle at all besides adjacency and enveloping/surrounding. Everything else, e.g. classification, aggregation, belonging, are added by the human mind. That follows naturally into the classes/entities universe. What EricEvans has done it to construct three different organizing principles for aggregating the classes in a software system, of which layers is only one, and the others I can't recall at this instant. --AlistairCockburn

I generally find EntityRelationshipDiagrams (or schemas) relatively easy to absorb, especially if there is a good written description of the columns. UML is such an inconsistent mish mash of everything under the sun, that they are not very easily absorbed. IOW, data schemas plot data and only data. Further, the total number of entities for a well-factored project is probably going to be far smaller than the number of classes in the same system. (Although there is disagreement over what "well factored schema" means, as described earlier.)

Higher level organizing principles are necessarily abstract. Domain modelling is half of the story, though no doubt attractive to the traditional OO academic interpretation. I don't see any significant difference in OO versus procedural code in this respect. Both can be done badly. --RichardHenderson.

But what is the big picture of "good" OO? Procedural/relational appears to have a more consistent big-picture to me.

Don't confuse UML with OO principles. It is indeed a mishmash, and UseCaseDiagrams? are the most egregious example. -- TomRossen

Some OO proponents seem to think they are related or even the same thing. Perhaps this is another HolyWar between OO practioners?
Consider that OO supports visibility restriction (public, private, protected) which helps discriminate externally significant from internally available in a trustworthy manner. This simplifies the BigSoupOfClasses into a simpler soup of visible classes & methods. The enforcement mechanism in OO languages is what makes visibility trustworthy.

At a more macro level between application and class, you have various schemes of restricted visibility: ...etc.

All of the above are explicit means of scaling up "how to make sense of this XXXXX soup" problem.

-- BobLee

Fans of dynamic OO languages like Smalltalk or Python may likely disagree. Besides, the "taggers" often get those wrong, triggering complaints from package users. They often assume you can predict future uses. But assuming they do have some important large-scale communicative ability, could you provide an example perhaps?

For me, I think this response gets closest to the core of the matter. OOP requires reductionist thinking, it drives reuse by allowing you to know as little as possible about large scale structure. By seeking to understand the 'bigger picture' you loose the considerable advantage of not needing to know the bigger picture. -- RichardCordova

First off, "reuse" as the main reason behind OOP seems to have diminished of late (ReuseHasFailed). Second, there are other ways to compartmentalize things besides objects. The suggested claim here seems to be that OO allows one to keep thinking in the small, and the large-scale structure sort of organically emerges but nobody has to monitor or understand this large-scale structure. It is not a claim I hear very often. But it brings up the question of how things like OnceAndOnlyOnce and data integrity can exist in such an organic environment. Is this what ExtremeProgramming allegedly solves via incremental refinement? But any paradigm can be subject to such a process. --top

The idea that the superstructure needn't be limited to what someone can understand or oversee is quite appealing to me. The power of XP would lie in distributing the knowledge in such a system. As pointed out, this isn't specific to OOP, but perhaps it is more enouraged? -- RichardCordova

Me too. As I mentioned on OneMoreLevelOfIndirection this is a kind of complexity that cannot be reduced by introducing another layer of abstraction but only by a layer of process (XP) or persons (experts).

I find it odd that people would be promoting the idea that having a "big picture" is not useful. How do you find stuff when a change needs to be made? How do you identify OnceAndOnlyOnce if you have nowhere to look for potential copies? Is it productive to wonder around the city with no map? No MentalIndexability? I don't get it. I suppose you could provide a google-like search tool on top of the BigSoupOfClasses, but such would be hit and miss if you used the wrong wording, searching for "delay" when the code used the term "timing" instead, for example. --top

Sure, you can try to provide the necessary information in form of documentation or further layers of code. But this is not very efficient if a) your artifacts fasten then you can update your "index" documentation or b) your further layers and documentation impede working e.g. by making the mess even larger. Indexing is difficult, so you might need experts to even maintain it. Keeping the knowledge in the heads of the team may simply be more efficient.

Please clarify what you mean by "artifacts fasten". I don't necessarily mean external documentation. What I have in mind as examples are things like ER diagrams and a list of "primary tasks" typical of procedural+RDBMS-based design. I find these "big pictures" indispensable and relatively consistence from shop-to-shop (although the list of primary tasks may take some personal digging to uncover at first). The lack of something comparable in OO that is fairly consistent is disconcerting. Each new OO app is a random jungle with no map.

Related to granularity issues is the "tables are a larger-scale structure than classes". There is no "object math" comparable to (good) relational algebra. There is yet to be a DrCoddOfObjects?. See OoLacksMathArgument.

OO principles and DesignPatterns provide some of the basic tools and techniques you need to build scalable and durable systems. The trouble with design patterns is, that there are so many of them by now, that newcomers can be overwhelmed, and don't know where to start. However, there are certain design patterns that explicitly address scalability of object oriented designs. ComponentBasedDevelopment builds on these patterns and has led to language constructs such as interfaces and packages in Java, and equivalent constructs in other languages. These constructs can be used as building blocks for design structures larger than classes. For example, the KobrA approach to SoftwareProductLineEngineering? uses a component definition that ensures components have a fractal characteristic. Poor dependency management is a very widespread "phenomenon", in both OO and in non-OO software, and more than "just" OO is required to design large systems that are maintainable. Because I come across dependency management issues again and again in my consulting, I have compiled a white paper on this topic at, where I describe some general principles for components and framework design. -- JornBettin

Design patterns are discussed above a bit. The biggest problem I have with many design patterns is that I feel that "noun relationships" should be tracked and managed in a database, not in source code. This view tends to conflict with the GOF pattern view. Also, I tend to share the PaulGraham view that many OO patterns are a sign of weak languages. See PatternBacklash. I agree though that the GOF pattern movement is the first real attempt to bring some order to OO chaos. It is just that the GOF approach has too many holes.

I don't disagree that some patterns result from weaknesses in OO or in specific OO language environments. The question is: Does it matter? Many patterns can be expressed as aspects quite nicely. ModelDrivenSoftwareDevelopment (MDSD) allows you to precisely specify when and how to apply which pattern mechanically, and it allows you to raise the level of abstraction. If you are an expert in a particular domain, you can use MDSD to define and implement DomainSpecificLanguages. In this approach you typically leverage the concepts of OO frameworks and components. A lot of ingredients need to come together (DomainAnalysis, MetaModeling?, ModelDrivenGeneration?, TemplateLanguages?, DomainDrivenDesign) to raise the level of abstraction. The component concept can be used to effectively partition huge domains. In contrast, an uncomponentised relational model with several hundred or several thousand entities becomes incomprehensible, leading to monstrous entities. "Customer" would be one painful example in ERP systems. Different subsystems in large systems often deal with different facets of one concept. -- JornBettin

That is what queries are for. There is no One Right Viewpoint, so we have query ability to deliver a Just-In-Time local-use abstraction: the result set. Aspects are still too new. They have yet to be time-tested in multiple domains. Some say aspects are just meta programming for static languages.

[Queries miss the whole point of what he just said. The model exists to help understand the domain, to add structure to it, queries have to work on that big imcomprehensible monstrosity he was speaking about called the schema.]

I find schemas more comprehendable than giant code bases. A lot of the problem with schemas are bad normalizatioin and bad documentation (such as no field description dictionary). However, sloppy designers pop up in all paradigms. But, I'll take bad schemas over bad OO any day.

[The whole point of the model is to escape queries, which are too soft, to maleable, and impossible to enforce rules on. The model on the other hand, can enforce rules without requiring you know or understand the rules to use the model. Queries can't do that, queries force you to understand every rule for each query individually, making them severly error prone. A functioning model is far more useful and less error prone than a queryable schema in just about every scenario but ad-hoc reporting.]

Example? Relational data intregity is a pretty powerful tool. Plus it works regardless of the language accessing the data, something language-centric approaches have trouble with.

Different facets of one concept can be modeled with precision using components, OO principles and aspects. In short, objects and components are stepping stones on the path to raise the level of abstraction from software concepts to problem domain concepts.

The latest additions to this thread seem to steer into the classical OO versus Relational debate. This has been dicussed too often to be repeated again. I just wanted to make the point that by now the real discussion has moved well past the basics of OO. -- JornBettin

Hell, even the basics were poorly defined/analyzed and lacked consensus metrics. Going "beyond" that puts hair on the fuzz.

I would not mind merging this with OoLacksConsistencyDiscussion, but both are too long for such.
See also: LotsOfShortMethods, FearOfAddingClasses, RefactoringMercilesslyHidesTheForest, ReadingRavioli, ProceduralMethodologies, OoLacksConsistencyDiscussion, DontNameClassesObjectManagerHandlerOrData, EatingSoupWithaFork

View edit of October 27, 2010 or FindPage with title or text search