The naive belief that adding tables to a database makes it more complicated...
"No," the novice said, "we should use this design instead of that one because it's clearly simpler; it has fewer tables and relationships." (An extreme case leads to "one big massive 'do it all' table" a GodTable
- NO; a good design is generally simplified by adding tables. You have to look at the database queries to see this -- not just the database diagram.
- I look at both, and don't see this to be the case for reasons given. -t
- Does more or "thin" tables result in a better design and maintenance effort?
- Does more or "thin" tables result in better data integrity?
- Is there value in having the base tables match the users' typical view of info (which may not be the "thin" table approach)?
- Do more joins result in performance loss? And if so, is it because of bias in existing technology rather than theoretical limits?
- Is it possible to make the wide/thin distinction be a matter of view or presentation such that one is not forced to choose one over the other?
- I protest "naive". I disagree, but do not consider myself naive, having many years of experience under my belt.
- The point of this page is that if you believe that whenever you add tables to a database you make it more complex, then you are naive, regardless of your experience... if you decide that you will not add a table as a result of careful analysis (or even UnconsciousCompetence) then you are not naive and this page does not apply for you... happy?
- To make it less "attacky", I suggest renaming it to "WhenToAddTables?" rather than label one side of the decision up front with the nasty-tag of "fear". I've also seen naive amateurs get table-happy and make a mess. -- top
- There is fear because of the way humans visualize. Multiple tables requires multitasking. Programmers are not good multitaskers. They are focused. Think about when you first design a program... there is no clear consensus as to how the program will be put together... it is up in the water. It is to do with our imperfect human nature -- we visualize once and only once initially as one single table because we cannot multitask (and when we do multitask across several tables initially (it is like doing laundry while doing dishes). It is hard... but it can be done later, when we realize that multiple tables, actually relvars... and foreign keys... leads to a simpler design. The problem is also related to viewing the relations as tables and cells versus seeing them as relvars with keys. A simple table is easy to visualize as a bunch of cells... but foreign keys requires visualizing an image of several tables (or relvars), which isn't as easy to see compared to a single table with cells. ImproveDatabasesOrElse
- I don't find it "simpler". You need to devise objective evidence of "simpler". The downsides of semi-sparse tables are less in my opinion than the headaches lots of tables and lots of joins. I've worked with both, and I vote the first. I smell PersonalChoiceElevatedToMoralImperative. And a lot of newbies seem table-happy in my experience. I suspect their OOP training does that to them. -- top
- Well, in my experience is the other way around must OOP newbies I know suffer from FearOfAddingClasses, and since most ObjectRelationalMapping generally map Classes to Tables... On the other hand, when they get more experience with the ObjectRelationalMapper the realize that from an ObjectRelationalMapping point of view tables are almost irrelevant, after all, in any good ObjectRelationalMapper a complete class hierarchy can be mapped to a single table... Enums can be mapped to fields... Embbeded classes can be created to group fields in a table... so the use of an ObjectRelationalMapper may in fact reduce the number of tables while increasing the number of classes... paradoxical ain't it?
- On the other hand, ProceduralWeenies? or, more precisely, ProceduralWeenies? that are also FileWeenies?, have the tendency (in my experience) to understand Tables as Files, and so tend to do crazy stuff like creating tables with names like CustomersObtainedOnJanuary1Of2007, CustomersObtainedOnJanuary2Of2007, etc, etc. They are true pathological TablePhiliacs?.
- I've never proposed that. I see no reason to bring up here. (Although, sometimes it makes performance better if old stuff is referenced far less often than new stuff. But I've never used that technique myself yet. ArchiveTable) At least let's divide the discussion up into "vertical" partitioning versus "horizontal" partitioning. -- top
- Personally, I've never done true OOP (whatever that is... Top, please define what OOP is then? See you throwing around terms too? ThereAreNoTerms? and ThereAreNoTypes and there are no dictionaries or points of references, right?)
TArchitecture = enumeration(simple, complex, NULL)
- Simple ultimate design versus simple initial visualization. The first visualization may be a simple single table with a few values in it, in my mind. But simple design for a complex problem needs to be more complicated than simple... a visualization of multiple tables with keys is not as simple (easy?) of a visualization, but it is better and is the simplest solution once we are further down the road. Take a look at this picture which isn't as simple to initially visualize, but once studied it leads to a simpler design in the long run: insert_image_here_I_have_to_find_it_on_google
I have seen this happen frequently. Instead of creating a new table that is a better match for a particular entity concept
, a field is added to a table and now two entity concepts
live inside the same table. Sometimes it is a good idea -- if the two entity concepts
are related in an object-oriented inheritance kind of way -- but other times, entities are mixed inside the same table without having a real conceptual IS-A
relationship between them, and one ends up always having to filter by a particular field (or even creating wrapper views) to treat that table as 2 tables.
Are normalization rules always able to tell us what to do in such cases?
In general, yes. However, normalization rules are merely the automation of common sense. There should be one, and only one, entity(-concept) per table. You should know you've gone wrong when you try to shove two entities(entity-concepts) into the same table, without having to go through the process of normalization.
Hard boundaries between "entities" are sometimes non-existent, or not safe to assume permanent. ClassificationIsTough
. For example, if we made two separate entities, "peons" and "managers", then over time features that managers have may shift over to peons and vice-versa. If the DBMS or our code makes it hard to shift columns between entities when things change, then it makes sense to just have an "employee" entity, and leave non-applicable features blank, null, zero, or false. In my head I'll generally do an informal change cost and probability calculation to find the least costly approach (approaches I learned in statistics, AI (game theory), and "management science" courses). -t
If determining a particular hard boundary is difficult, then where the boundary is drawn probably doesn't matter [and perhaps shouldn't be drawn at all]. Fortunately, in most cases it's straightforward.
There is a wide continuum between "wide" tables that don't explicitly duplicate values and key/value map-like tables. The documentation for determining where and why to split in an objective and road-tested way simply does not exist. Your gut feeling will give a different answer than my gut feeling.
Actually, it does exist and it's well-defined. It is the rules for achieving higher normal forms. The higher the level of normalisation, the fewer the update anomalies. The downsides (in terms of normalisation) are well-documented as well, in the same descriptions for achieving higher normal forms.
At best that provides fuel for debate and discussion, but doesn't tell us which is "right". The way the topic title is worded, that problem is solved.
[Nonetheless, it tells you "where and why to split in an objective and road-tested way"
, without any need for intestinal palpation. Not sure where 'right' comes into this.]
Discussion moved to FearOfAddingTablesDiscussion
, this page and ThinVersusWideTableDefinition
are attempts at dissolving the thread mess...
See also FearOfAddingClasses
and again MarchZeroEight