Lets Use An Object Oriented Database

OODBMSs can help solve a lot of technical problems that occur over and over again in OO projects that use some kind of ObjectRelationalMapping. Unfortunately, your client and your manager do not believe in OODBMSs. You are going to use a RelationalDatabase.

Too true. About three years ago I was consulting with a bank in Quebec City and they kept talking about the "new technology" database that they were going to use for this project. Turned out to be Oracle. Sad thing is...it was new technology compared to what they were using. -- AnthonyLander

For a similar view, see http://www.pipeline.com/~hbaker1/letters/CACM-RelationalDatabases.html ("Relational databases considered harmful", by HenryBaker Comm. of the ACM 35, 4 (April 1992), 16,18.).


Yesterday, I was in the audience for a presentation given by a ContentManagementSystem vendor. They were very pleased of their model for content, object this and meta that and inheritance the other. And it runs over Oracle. Up goes my paw: why not use an OODBMS? Chuckles all round and there came out the story of another content management outfit who did use an OODBMS, and have nearly gone bust since very few customers were prepared to have the thing in the building. Even for web content, not legacy data, people won't touch an OODBMS. Astounding. -- KeithBraithwaite

It is astounding. What the hell is everyone so afraid of? GemStone has saved me lots of development work. GemStone has saved me lots of response time. I intend to continue using it for competitive advantage over those who insist on doing it the hard way. -- RandyStafford

Smalltalkers have a get-out here, as GemStone is well known in their culture.

Java developers have a get-out here, too, with GemStonej. Even though GemStonej is marketed as an application server / EJB container, it still has all the good old OODBMS capabilities that Smalltalkers have come to know and love. And it's a pretty darn good application server / EJB container, too.

I just wrote a GemStonej application that can switch between object persistence and relational persistence at runtime, based on a property read at startup. At present I'm seeing an order of magnitude difference in response time between the two persistence modes (object persistence being faster, of course) - and this says nothing of the extra development effort you have to go through to do O/R mapping. But in fairness we have more optimization to do on the relational side. And for a simpler domain model than we have, relational persistence may be OK.

In some environments it may be the case that interfaces must be built to legacy systems or persistent stores of record that are relational. That's OK - one can still use GemStonej as a durable (in the ACID sense) cache for objects mapped out of RDBMSes (GemStonej provides pooled JDBC connectivity out the back end, and also provides an O/R mapping layer from a technology partner). The caching approach yields the response time of object persistence and the environmental compliance of relational persistence, at the cost in additional complexity of synchronizing the persistent stores (which cost could be mitigated by a GemStone-supplied caching framework).

The whole thing brings to mind several analogies. There is the old car-and-garage analogy (assembling your car in the driveway from parts stored in the garage, every time you drive it), and also the hardware analogy - the reason RAM caches exist is to provide faster access than can be had from slower forms of storage.

-- RandyStafford

Downsides to OODBMSs: strictly hierarchical structure. No/lousy searching/sorting functionality. No SQL. Scales poorly once you get past physical memory size. Scales excruciatingly once indexes get past physical memory size. RDBMSes are annoying to OO folk, but there are good ways of dealing with them - see CrossingChasms.

Also worth discussion here: MetaKit.

At least in GemStone, the number of pointers that have to be traversed to get from root (e.g., JNDI-bound) objects to other objects is an application design issue; there's nothing to prevent you from having a root collection for every type of object (a la relational tables). The searching and sorting functionality is a function of collections classes written in the language used by the OODBMS (e.g., Smalltalk or Java). And the Smalltalk version of GemStone provides SQL access to its persistent objects through an add-on product called GemAccess?.

Are you generalizing a bad experience with one OODBMS to all OODBMSes, Peter? Have you had firsthand bad experience with one of them in a production application (which one)? Note that one of the authors of CrossingChasms is a GemStone employee. The other author writes in UnderstandingDistributedSystems that he's "told many people that if they want to know what EnterpriseJavaBeans and distributed programming will look like in 5 years, look to see what Gemstone/S provided five years ago."

Actually, I dearly love an OODBMS. I just get shouted down on the basis of performance tests when I try to get one used. The ones I've had significant experience with were objectstore and versant. Loved the former, disliked the latter.

My empirical touchstone here was a comparison we conducted at AC Nielsen in 1996. We tried every OODB on the market then, about a dozen including, I seem to recall, Gemstone. We found similar performance constraints on all. Regular OS paging mechanisms just couldn't keep up with the specialized disk access techniques of the RDBMSes.

Eventually we went with Illustra as a compromise. Illustra got eaten by Informix, and then all this ThreeTier? stuff happened and I haven't gone near an OODB since. Everyone says such nice things about GemStone, though, so perhaps it's time I shut up and try it again. -- PeterMerel

Yes, I saw that on ObjectsVsRdbmsPerformance. It was interesting reading; thanks for sharing it. I hope you do try again ;-) -- RS

My experience is that while GemStonej is typically faster than an RDBMS and more flexible, the work required to "roll your own" indexing, searching & querying framework can be complex for some developers not accustomed to dealing with the performance problems of dealing with thousands to millions of objects.

For instance, in GemStonej 3.x I had to create my own indices to get decent performance from a typical 'select' query [the application was web-based & involved a fair amount of data entry & querying].

The basics of indices aren't terribly complicated to implement, but there wasn't much in the way of documentation to explain the hows and whys of doing this sort of thing. It was a very heuristic process (and required the help of some GemStone consultants).

I think this comes to the abstraction principle: an RDBMS is a black box with a big label on it that says: "This is REALITY. :thud of SQL manual falling on desk: You must think inside of this box." An OODBMS says: "Here's a set of minimal rules. Now you can create the rest of your own reality."

It's a sharp contrast from the much of the mainstream VB+SQL crowd who want to create the world with popsicle sticks & duct tape, "No experts required". At least Microsoft is trying to change this with COM+.

My opinion: In application areas where it pays to "think outside of the box", an OODBMS is the proper choice, assuming you work with people that can handle the power properly.

-- StuCharlton

I come at this from a unique perspective: I was one of the original designers of TopLink/Smalltalk, an object/relational mapping tool, and I've completed a largish project using GemStone/s.

In a nutshell, here's the pros and cons as I see them:

RDBMS: Highly structured, well known and well understood. Great for all kinds of things that involve grinding over reams and reams of data. Banking applications, report generators, usage analyzers, etc, etc. There is one class of problem that relational databases are simply terrible at solving efficiently. More on this in a second.

If you lots of relational data or lots of concurrent users and you think you can get good performance from your RDBMS without hiring an Oracle (Sybase, whatever) genius, then you're simply kidding yourself. These guys work magic; I watched an Oracle consultant for an afternoon once, and was totally amazed. It made me realize just how little I understand about databases (despite writing TopLink).

OODBMS: Unstructured. Are you sure it is unstructured? Objects are like cee structs (structures). It's probably still structured, but structured differently. Lots of corporate resistance and comparatively little market penetration. People tend to be pretty picky about where they store their company's lifeblood (i.e., data). Strangely, they jump head first onto whatever programming language de jour is being advertised in the weekly trade rags, but that's besides the point. Object databases can be used to grind reams of data just like an RDBMS, but you have to build lots of the infrastructure yourself. Object databases can solve the problem I hinted at earlier, which is this: efficiently traversing data relationships.

Just like with a relational database, if you're doing anything serious with an object database, you will need the expertise of a very qualified engineer in order to squeeze performance out of your server.

Efficiently traversing relationships: Relational databases don't do this well because it's hard to arrange locality of storage. If I read an employee, then his department, then his office, etc, etc... I'm going to have to return to the database several times. Object databases, on the other hand, let you cluster related data together, enabling you to traverse relationships more efficiently.

So what? Well, the GemStone application I worked on was a case management system. It's never going to be used to grind over tables and generate reports. The only thing anybody ever does with it is traverse object relationships. I seriously doubt that it would have worked even half as well with a relational database behind it.

Oh, and one more plus for object databases... well, GemStone in particular: it's much better to write server code in the same language that the client is using. This allows you to share (sometimes) code and (oftentimes) concepts between the two places. GS allows you to do this with Smalltalk and the Java. Oracle's starting to get a clue by letting you do some of it in Java. -- AnthonyLander

Does storing of tables (whether RDBMS or OODBMS) have anything to do with performance? Like, for example, you can store one field in an RDBMS on consecutive disks to increase write speed. And if at all there is a better way to store hierarchy of objects, we can expect better-performing OODBMS. -- SeshKumar

Databases were oriented towards data long before object-oriented programming languages started to appear (Simula 1967, SmallTalk, C++ and Java a lot later). An ObjectOrientedDatabase is actually not oriented towards objects, but towards the programming language, and since C++ and Java are poor at searching and indexing, so are the OODBs. From the view of object-orientation and data management, this is a step back from RelationalDatabase technology.

The last sentence there doesn't seem to have anything to do with the previous ones. Could you explain why "from the view of object-orientation and data management, this is a step back"?

I would say that, at least in today's world, the notion that OODBMS products provide poor query performance is absolutely false. I have empirical evidence to the contrary. Especially where the object model is extremely complex. I don't advocate object databases be used as large data warehouses, but I do advocate them being used an transactional persistence extensions for object-oriented languages. They far surpass anything that is possible with RDBMS products, in this particular usage context, with today's existing RDBMS products. -- JeffPanici

Personally, I don't care much for an ODBMS. RDBMS's were built on clear mathematical foundations and worked very well for a long time. ODBMS's grew out of a need to serialize objects and will not evolve to the level that RDBMS's have. I think the next generation will not be relational or object-oriented but associative. Things inching down that path include Lazy Software's Associative Data Model or Mirror World's LifeStreams/TupleSpace. -- RobertDiFalco

I disagree that OODBMSes grew out of a need to serialize objects. OODBMSes grew out of a desire to build DBMSes that could persist objects from ObjectOrientedProgrammingLanguages in their "native" form, with as little transformation as possible. -- RandyStafford

How is that different from serialization? "Serialization" seems to mean the practice of taking a graph of objects (and references between them), which lies in main memory, and converting them to a format appropriate for persistent storage. Among the swizzles needed for serialization are:

Unless an object is a leaf or tree object (ignoring any transient attributes), serialization almost applies some level of transaction support to be done properly. Serialization of a complex graph a node at a time, after all, is asking for trouble.

Q: Is is true that true (for lack of a better term) OODBMS are restricted to hierarchies? Most of the problem domains I am familiar with are inherently a set of relations, which may mean a network graph (many to many). If so this is a severe limitation. For ACID compliance you must not have multiple occurrences of information (such as 2 entries describing the same thing under different hierarchies) or you lose data integrity. This would be a major reason to not use them in my situation.

A: No. See NetworkDatabase.

Also as a side note, the databases I deal with are multi-gig and that is small in my problem domain (Environmental Sciences, Geolgists in particular are used to terabyte and larger data sets). RAM caching is not an option at this time, and if it caused severe performance issues this too would prevent us from using them.

Having worked in a NetworkDatabase environment, this is even worse. It looks like a step backwards to '70s commercial technology. Chaining records, AKA pointer chasing, is a nightmare I do not want to relive.

I think the next generation will not be relational or object-oriented but associative

Q: How is an association different from a relation? I have also heard the term 'graph database' but I am confused since in my sophomore year Computer Science classes I learned that: a) Association and relations are logically identical. b) Graphs are isomorphic to relations (you can always map one to other and define operations which preserve relationships and mappings). c) A tree or hierarchy is a subset of a relation. You can always derive a tree from a relation but not vice versa.

It almost looks like the wheel is being reinvented yet again.

Kind of an interesting subject to me, how widespread is the notion that relational data is a failed experiment, or a fashion which never fully took hold and is now in sharp decline? I've encountered this, I think it's ludicrous. The existence of working programmers, who manage to buffalo their way into lots of technical scenarios because they're not intimidated by something complicated, and can just slam their head against a wall, and that generally seems to be working for them, but actually they don't have a solid grounding, this is what I see alot--programmers who specifically don't have much knowledge/use for relational databases, although they're generally currently working on a project that involves one.

I'm being a bit ungenerous, it's easier to see when somebody else is just in over his head. But I want to be clear, that being a programmer, being an experienced programmer with an ego to match, this does not in my experience entail knowing about relational database theory, or even knowing that you don't know. Probably, there is a larger point to make, that you can't fake it into the more abstract realms of mathematics--programming is not the same as math. I frankly think that mathematicians are in a, so to speak, far more scientific/advanced field, than programming.

I'd make an exception for the great names in computer science--the typical cubicle dweller is not an intellectual giant, and in fact, might be smarter than the pointy-headed boss, but probably doesn't know relational theory just like he doesn't do math for fun in his spare time. Lisp programmers, I expect are not averse to 'mathy' type stuff, if I may generalize. The mainstream, Java etc., type dude, one has to accept reality here. These are not extraordinary intellects. Relational is a different paradigm and it's I think normal for programmers to get through life w/out ever mastering it.

Anybody who has really been exposed to debates about OO vs. relational and thinks that OO can even be respected as an alternative, possibly knows something that I don't, but do relational experts ever think so?

What are the reasons why Relational Data is a failed experiment--any takers? I'll list some reasons that I've heard, that I don't consider to contain much of any substance at all.

The relational data model is too far from the machine--is this a flaw?

At least relational data is stored, and computers can store data! In memory or on the hard drive! So the relational model is still practical on hardware indeed, thankfully. OOP is also far away from the machine but it works fine. Purely functional programming doesn't find any use on machines because it doesn't do anything - it has no side effects LaughOutLoud. Computers do things, whereas functional programs don't do anything. They only do something if you put in side effects. Niklaus Wirth summarized this in his paper Through The Looking Glass: "To postulate a state-less model of computation on top of a machinery whose most eminent characteristic is state, seems to be an odd idea, to say the least. The gap between model and machinery is wide, and therefore costly to bridge. No hardware support feature can wash this fact aside: It remains a bad idea for practice."

The argument here, is that it might be a brilliant idea to abstract the machine away, but computers 'actually' store information serially--in blocks of memory which are serially contiguous to each other. So, with relational, you have hardware problems, and when you write stuff out to persistent media, guess what that's serial too. So the idea of moving data around completely freely, and relating it in any way, as in the relational data model, is too far from the reality of computational hardware for it to actually work.

I've seen this brought up on these pages, and maybe they should be linked (I may get a roundtuit). My reply, in brief, is fine, if you start with the notion that we're talking about a brilliant idea--is its brilliance truly appreciated? OO databases aren't really databases to me, how to explain this? Are we just talking about a performance issue, being a reason to reject having a 'real' database?

Another reason, and I'll just list one more, although I've heard a number, is the idea that putting information in lots and lots of little tables that all relate to each other is a bother (there is, again, the performance issue--it runs slowly because of joins).

I'll mention something of why I consider relational databases to be 'real'. Well, how about just one thing: being able to roll things backwards, and roll things forwards, is a huge luxury pragmatically, and an elegant idea. Perhaps there are those who will accuse me of not knowing how far OO databases have come--I might learn something?

I don't think anyone's considered relational databases -- in the form of SQL -- to be a "failed experiment" since the mid to late 1990s. At least, not since SQL databases have been effectively used to underpin (or at least play a vital role in) nearly every dynamic Web site, whether large or small, significant or insignificant, with two users or twenty million. These (the early 20teens) may well be the salad days of the relational database. At the same time, except in niches, OO databases haven't been seriously considered for general purposes in over a decade.

See also: ProgrammingWithoutRamDiskDichotomy
CategoryDatabase - relational DB discussion

View edit of March 10, 2012 or FindPage with title or text search