Relations As First Class Objects

RelationsAsFirstClassObjects is a newish approach to interfacing application languages to relational databases which offers an attractive alternative to ObjectRelationalMapping.

Rather than attempting to import data from relational databases into the application languages such that individual rows or entities look like first-class objects "transparently", RelationsAsFirstClassObjects makes relations composable, such that queries can be easily manipulated at runtime, and does comparitively little with the data itself, providing no object identity or caching mechanisms. The first-class relations can have the same interface as native collection data structures.

The best description I'm aware of is on AviBryant's blog, where he describes a system he developed for SmalltalkLanguage called Roe. An excerpt from http://www.cincomsmalltalk.com/userblogs/avi/blogView?showComments=true&entry=3246121322:

O/R mapping adds a huge amount of complexity to a project, and with dubious benefits: when you're not tracking down bugs in the mapping framework or obsessing about performance, you're chopping toes off your object model so you can shoehorn it into a relational schema. I've seen projects nearly fail because too many resources were thrown away trying to "solve the O/R problem" - which always means, solve it in this one very particular context, since there are so many inherent tradeoffs that any one framework is rarely optimal for two different scenarios.

I came out of one such project determined to try something different. Inspired somewhat by SchemeQL, I built a library that models the relational algebra in Squeak. The Relational Object Expressions library, or ROE, doesn't do any mapping, but it does put the world of tables and rows on an equal footing with objects. Relational expressions (ie, queries) are first class objects, which can be easily passed around, composed, and introsopected. They respond to the Collection protocol, which means that from the outside they look and feel like an ordered collection of Tuple objects, but they're lazy - composing them and filtering them is free, it's only once you start iterating over the data that a single SQL query is built and sent out to the database.

SchemeQL also takes this approach. It is described in this paper:

http://schematics.sourceforge.net/schemeunit-schemeql.ps

Which was presumably inspired by CommonSQL, at http://www.lispworks.com/reference/lw43/LWUG/html/lwuser-167.htm#pgfId-885977

KragenSitaker hacked up a public-domain spike of this in PythonLanguage one evening: http://lists.canonical.org/pipermail/kragen-hacks/2004-April/000394.html

Criteria, for RubyLanguage, also belongs to this class of solutions:

http://mephle.org/Criteria/

An example using Criteria which demonstrates query composition and combination:

	def get_employees()
		relation = SQLTable.new("employees")
		return relation
	end

def get_programmers() employees = get_employees() programmers = (employees.job == 'programmer') return programmers end

def get_wealthiest(relation, limit=5) relation.order_by = relation.salary relation.limit = limit relation.order = :DESC return relation end

def get_wealthiest_programmers(limit=5) relation = get_wealthiest(get_programmers(), limit) return relation end

# Relations can be accessed using the collection protocol. When # the program attempts to iterate over the relation or access a row, # SQL is generated and sent to the database. The following will generate # something like this: # # SELECT * FROM employees WHERE employees.job = 'programmer' LIMIT 15 ORDER BY employees.salary DESC

puts "The 15 wealthiest programmers are:" get_wealthiest_programmers(15).select.each do | tuple | puts tuple end

puts "The 10 wealthiest executives are:" execs = get_employees() execs.job == 'executive' get_wealthiest(execs, 10).select.each do | tuple | puts tuple end


How are these different from embedded SQL implementations?

If you use embedded SQL strings and you want to sort descending instead of ascending, or narrow the selection to a particular customer, you have to either use a different SQL query, or you're stuck munging the SQL expression string. Criteria and its ilk allow you to compose and combine expressions more naturally, allowing you to take apart, modify, and assemble queries as needed. It's like the difference between a language with numeric values and arithmetic operators vs. a language with only numeric values, if that makes any sense at all.

Do these techniques tend to spread table knowledge through out the application (i.e make it hard to encapsulate the schema in a data tier)?

The same strategies that people use to decouple databases from application logic with embedded SQL or O/R mapping work equally well with this. Also, it enables OnceAndOnlyOnce to a far greater degree than embedded SQL.

But doesn't the ObjectRelationalMapping problem arise from attempts to decouple application logic from the data base schema? How then does making the relations first class objects assist in solving the problem? Granted, this makes implementation of the mapping in the data tier easier, but it doesn't seem to replace it.

[I don't think so. I think most O/R mapping problems come from trying to hide and/or automate the abstraction.]


APIs that replace SQL tend to be at least as ugly as SQL. Perhaps if a different RelationalLanguage was used instead of SQL the results might be a bit more palatable, but the bigger problem is that "relational algebra" does not lend itself to be wrapped by an API very easily. Specifying even Boolean expressions ("x > 3 and (y = 2 or y < z)") using an API results in hard-to-read and/or verbose code. The SimplestThing is to just live with query strings. Wrap them in a method or function perhaps, but don't try to reinvent SQL the hard way.

Boolean expressions on first-class relation objects in the languages described above are nearly identical to the same expression in SQL. I.e., "x > 3 and (y = 2 or y < z)" would look like this using Criteria:

	query = (table.x > 3 & (table.y == 2 | y < z))

Does this execute at the database or in a specific language? Remember the goal here was to wrap SQL, not execute it. It generates an AST which gets translated to SQL when data is requested from 'query'. No, we're not executing queries in the application language. We merely composing them in the application language.

Is that really any worse than the equivalent SQL?

	SELECT * FROM table WHERE x > 3 and (y = 2 or y < z)

''In a language that doesn't allow that kind of abuse of OperatorOverloading, I can see your point, but I don't see that it holds with these toolkits."

Well, Criteria looks almost like Ruby, but that appearance comes with a lot of gotchas. & and && have different operator precedence, so Criteria users have to remember to use parens all the time. Also, "&" has a bunch of different meanings in Ruby but none of them are strictly boolean AND. I know many Rubyists who love Criteria, but I also know a few who think it's voodoo and won't touch the stuff. -- francis (see http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&th=bd95be9531dd8783&rnum=2 for more on this)

The SimplestThing is to just live with query strings.

No, not really. Not if it is your intention to use different databases with slight differences in their SQL syntax (i.e. LIMIT versus FIRST...SKIP, and so on). In that case, the simplest thing is to introduce another layer of abstraction (say an API) and all that API to be implemented for different variations on SQL but having an abstract implementation that uses the most standard syntax. -- RobertDiFalco

{I have never seen DatabaseVendorLock fixed by wrapping with APIs. The differences are too great and/or vendor A has as a feature that vendor B does not. It would be a huge undertalking to make a generic API to wrap SQL for all big vendors.}

Exactly. The SimplestThing is entirely a function of what you're trying to do. If you have hundreds of just-slightly-different queries, or if they're selected somewhat dynamically by user input or other factors, for instance, relational expressions could be a lifesaver. I have trouble seeing a downside to it. Yes, the expressions are another language which must be understood by developers and maintainers, but it's a more useful language than SQL because of composition, in my opinion.

I wasn't talking about DatabaseVendorLock (if I understand the term). I was talking about things (the great many things) that are common between various database vendors but have slightly different syntax. Tons of people do this including many JDO implementations. We put a layer on top of things like paged queries so that depending on the driver we may use LIMIT or FIRST...SKIP. It's pretty easy. We don't try to encapsulate specific vendor features that are not common across the high-level abstraction. -- RobertDiFalco

If you know of such a language or API, I would be happy to take a look. However, if such is possible/practical, then make *that* the query language instead of SQL.


EditHint: Some of this perhaps can be moved to DatabaseVendorLock


Readers of this page might also be interested in my (somewhat dated) NotesOnaCeePlusPlusRdbmsApi.


There is another 'programming language' coming from Microsoft, called Xen [see MicrosoftXen], which tries to embed XML/SQL into C# [http://www.extremetech.com/article2/0,1558,1441330,00.asp] -- VhIndukumar


The Ruby library above is just what I was looking for! I was thinking of writing something like this myself because I get so sick of having to stitch together SQL statements as strings in my database management software.

Does anyone know if anything like this exists for Python? If so I would be very grateful. Have seen things like Python Data Objects but they're more the object relational mapping thing which isn't what I'm looking for. I need something that can treat queries/relations/whatever as first class objects like the above...


CategoryDatabase

EditText of this page (last edited February 6, 2005) or FindPage with title or text search