Fear Of Adding Tables

The naive belief that adding tables to a database makes it more complicated...

"No," the novice said, "we should use this design instead of that one because it's clearly simpler; it has fewer tables and relationships." (An extreme case leads to "one big massive 'do it all' table" a GodTable)

Tentative Conclusion:
NO; a good design is generally simplified by adding tables. You have to look at the database queries to see this -- not just the database diagram.
  • I look at both, and don't see this to be the case for reasons given. -t


Issues Raised


       TArchitecture = enumeration(simple, complex, NULL)

I have seen this happen frequently. Instead of creating a new table that is a better match for a particular entity concept, a field is added to a table and now two entity concepts live inside the same table. Sometimes it is a good idea -- if the two entity concepts are related in an object-oriented inheritance kind of way -- but other times, entities are mixed inside the same table without having a real conceptual IS-A relationship between them, and one ends up always having to filter by a particular field (or even creating wrapper views) to treat that table as 2 tables.

Are normalization rules always able to tell us what to do in such cases?

In general, yes. However, normalization rules are merely the automation of common sense. There should be one, and only one, entity(-concept) per table. You should know you've gone wrong when you try to shove two entities(entity-concepts) into the same table, without having to go through the process of normalization.

Hard boundaries between "entities" are sometimes non-existent, or not safe to assume permanent. ClassificationIsTough. For example, if we made two separate entities, "peons" and "managers", then over time features that managers have may shift over to peons and vice-versa. If the DBMS or our code makes it hard to shift columns between entities when things change, then it makes sense to just have an "employee" entity, and leave non-applicable features blank, null, zero, or false. In my head I'll generally do an informal change cost and probability calculation to find the least costly approach (approaches I learned in statistics, AI (game theory), and "management science" courses). -t

If determining a particular hard boundary is difficult, then where the boundary is drawn probably doesn't matter [and perhaps shouldn't be drawn at all]. Fortunately, in most cases it's straightforward.

There is a wide continuum between "wide" tables that don't explicitly duplicate values and key/value map-like tables. The documentation for determining where and why to split in an objective and road-tested way simply does not exist. Your gut feeling will give a different answer than my gut feeling.

Actually, it does exist and it's well-defined. It is the rules for achieving higher normal forms. The higher the level of normalisation, the fewer the update anomalies. The downsides (in terms of normalisation) are well-documented as well, in the same descriptions for achieving higher normal forms.

At best that provides fuel for debate and discussion, but doesn't tell us which is "right". The way the topic title is worded, that problem is solved.

[Nonetheless, it tells you "where and why to split in an objective and road-tested way", without any need for intestinal palpation. Not sure where 'right' comes into this.]


Discussion moved to FearOfAddingTablesDiscussion, this page and ThinVersusWideTableDefinition are attempts at dissolving the thread mess...


See also FearOfAddingClasses, CategoryRelationalAntiPattern, ImproveDatabasesOrElse, GodTable, RatioWar
SeptemberZeroSeven and again MarchZeroEight and JulyZeroEight

EditText of this page (last edited November 10, 2014) or FindPage with title or text search